25 September 2012

246. Cluster network performance testing (very basic) on Debian Testing using a gigabit switch

Playing with hpcc got me thinking about my network connection.

My cluster looks like this:
I've got four nodes which are connected via two networks, 192.168.2.0/24 and 192.168.1.0/24. The 192.168.1.0/24 network is connected using a gigabit switch. Be (see below) acts as the gateway. The 192.168.2.0/24 network is connected via a crappy old netgear 10/100 router (dhcp) and provides access to the outside world (hello mac spoofing :) ). Each box shares a folder via nfs using a unique name.
_Nodes_
Be: AMD II X3, 8 GB ram (192.168.1.1): Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
Ta: Intel i5-2400, 8 GB ram (192.168.1.150):  Intel Corporation 82579LM Gigabit Network Connection (rev 04)
B: AMD Phenom II X6, 8 GB ram (192.168.1.101): Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
Ne: AMD FX 8150 X8, 16 GB ram (192.168.1.120): Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)

So, time to test the network performance:
sudo apt-get install iperf

On all your boxes (e.g. using clusterssh) start the iperf daemon
iperf -s

Then on each of your nodes run:
iperf -c 192.168.1.1 && iperf -c 192.168.1.101 && iperf -c 192.168.1.150 && iperf -c 192.168.1.120

------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 45.7 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.101 port 37893 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   564 MBytes   473 Mbits/sec
------------------------------------------------------------
Client connecting to 192.168.1.101, TCP port 5001
TCP window size:  169 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.101 port 35926 connected with 192.168.1.101 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  15.5 GBytes  13.3 Gbits/sec
------------------------------------------------------------
Client connecting to 192.168.1.150, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.101 port 48257 connected with 192.168.1.150 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   564 MBytes   473 Mbits/sec
------------------------------------------------------------
Client connecting to 192.168.1.120, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.101 port 43236 connected with 192.168.1.120 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   617 MBytes   517 Mbits/sec


Overall, this is what I got
Client/Server (MBit/s)
     Be     B     Ta    Ne
Be   13.7G  310   308   316
B    564    15.5G 564   617
Ta   726    660   19.7G 936
Ne   882    484   917   19.4G

I'm not sure whether to expect a metric gigabit (1000 metric MBit) or a binary one (1024 binary MBit), but looking at our results our best is 936 Mbit/s and worst 308 Mbit/s. All of them should thus ideally reach at least 936 MBit/s. They all have gigabit network card.

And now, try to improve it:
I went through the whole shebang with
sudo ifconfig eth1 mtu 9000
sudo ifconfig eth1 mtu 8000
etc.
Anyway, I got the following MTUs that way:
Be  7100
B    7100
Ne  9000
Ta   9000

I then set the MTUs to 7100 on all the nodes and tried pinging from node to node, e.g.:
ping -s 7072 -M do 192.168.1.101

Well, that maxed out at 1472 i.e. about MTU 1500 which was the original value. So I'm a bit confused.


Settings:
Be:
eth1      Link encap:Ethernet  HWaddr 00:f0:4d:83:0a:48  
          inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::2f0:4dff:fe83:a48/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:24124966 errors:0 dropped:27064 overruns:0 frame:0
          TX packets:19569426 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:25859945667 (24.0 GiB)  TX bytes:14200267703 (13.2 GiB)
B:
eth1      Link encap:Ethernet  HWaddr 02:00:8c:50:2f:6b  
          inet addr:192.168.1.101  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::8cff:fe50:2f6b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:14540970 errors:0 dropped:36651 overruns:0 frame:0
          TX packets:16801915 errors:0 dropped:2 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:12347398135 (11.4 GiB)  TX bytes:18008416370 (16.7 GiB)
Ta:
eth1      Link encap:Ethernet  HWaddr 78:2b:cb:b3:a4:b7  
          inet addr:192.168.1.150  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::7a2b:cbff:feb3:a4b7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:14717233 errors:0 dropped:68232 overruns:0 frame:0
          TX packets:17769966 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:13860096243 (12.9 GiB)  TX bytes:20207270880 (18.8 GiB)
          Interrupt:20 Memory:e1a00000-e1a20000 
Ne:
eth1      Link encap:Ethernet  HWaddr 90:2b:34:93:75:e6  
          inet addr:192.168.1.120  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::922b:34ff:fe93:75e6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13567520 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10710054 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:13086635236 (12.1 GiB)  TX bytes:12381041605 (11.5 GiB)

245. Recompile debian's hpcc with other libs

I installed hpcc using apt-get, but -- and this is a first -- when trying to run it complained over missing libs.



Why compile?

hpcc
hpcc: error while loading shared libraries: libatlas.so.3gf: cannot open shared object file: No such file or directory
Doing
aptitude show hpcc 
Depends: libatlas3gf-base, libc6 (>= 2.7), libopenmpi1.3, mpi-default-bin
apt-cache search libatlas.so.3gf

libatlas3-base - Automatically Tuned Linear Algebra Software, generic shared
libatlas3gf-base - Transitional package to libatlas3-base
and doing
 aptitude search atlas|grep ^i


i   libatlas-dev                    - Automatically Tuned Linear Algebra Softwar
i A libatlas3gf-base                - Transitional package to libatlas3-base
but
locate libatlas.so.3gf
comes up empty.

So build your own:
sudo mkdir /opt/hpcc
sudo chown $USER /opt/hpcc
cd /opt/hpcc
wget http://ftp.de.debian.org/debian/pool/main/h/hpcc/hpcc_1.4.1.orig.tar.gz
tar xvf hpcc_1.4.1.orig.tar.gz
cd hpcc-1.4.1/
wget http://ftp.de.debian.org/debian/pool/main/h/hpcc/hpcc_1.4.1-2.debian.tar.gz
tar xvf hpcc_1.4.1-2.debian.tar.gz
patch -i debian/patches/add-Make.Debian.patch

Edit Make.Debian. For some reason LAdir is ignored, hence the -L option in LAlib
 78 # ----------------------------------------------------------------------
 79 # - MPI directories - library ------------------------------------------
 80 # ----------------------------------------------------------------------
 81 # MPinc tells the  C  compiler where to find the Message Passing library
 82 # header files,  MPlib  is defined  to be the name of  the library to be
 83 # used. The variable MPdir is only used for defining MPinc and MPlib.
 84 #
 85 MPdir        =/usr/lib/openmpi/lib/
 86 MPinc        =
 87 MPlib        =-lmpi
 88 #
 89 # ----------------------------------------------------------------------
 90 # - Linear Algebra library (BLAS or VSIPL) -----------------------------
 91 # ----------------------------------------------------------------------
 92 # LAinc tells the  C  compiler where to find the Linear Algebra  library
 93 # header files,  LAlib  is defined  to be the name of  the library to be
 94 # used. The variable LAdir is only used for defining LAinc and LAlib.
 95 #
 96 LAdir        = /opt/ATLAS/lib
 97 LAinc        =
 98 LAlib        = -L/opt/ATLAS/lib -ltatlas
 99 #

The above assumes that you've compiled your own openblas as shown elsewhere on this blog. You can use whatever math libs you want. Again, there are a couple described on this blog (acml, netlib blas/lapack, openblas, ATLAS). I've had success with the netlib blas/lapack and atlas (built with netlib lapack).

mv Make.Debian hpl/
make arch=Debian

Hopefully everything went well. Now you need an input file.
cp _hpccinf.txt hpccinf.txt

Edit hpccinf.txt:
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
1000         Ns
1            # of NBs
80           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
3            Ps
1            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                               Number of additional problem sizes for PTRANS
1200 10000 30000                values of N
0                               number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64        values of NB

Launch by doing
mpirun -n X ./hpcc
where X=Ps times Qs (e.g. 3 in the example above).

I put the hpccinf.txt in a shared (nfs) folder (~/jobs), created a file called myhost

tantalum slots=2 max_slots=4
boron slots=2 max_slots=6
neon slots=2 max_slots=8
 and then launched using
mpirun -n 4 -hostfile myhost /opt/hpcc/hpcc-1.4.1/./hpcc

21 September 2012

244. Molden on debian testing

Update: avogadro can write gamess input files, but seems to offer little in the way of showing detailed output from gamess output files. Also, some of the input files contain keywords which don't exist.

Original post:
Nothing beats a good GUI, so after butting heads with gabedit again (and losing - again. Although in this case I think I tried to make it do something it wasn't designed to) I've decided to try Molden.

To download, go here, make sure to be a good citizen and register yourself as a user (will help motivate funding for development) then download: http://www.cmbi.ru.nl/molden/howtoget.html

cd ~/tmp
wget ftp://ftp.cmbi.ru.nl/pub/molgraph/molden/molden5.0.tar.gz
tar xvf molden5.0.tar.gz
cd molden5.0/

Edit makefile and remove -lXmu from line 20:

16 CC = cc
17 FC = gfortran
18 LIBS =  -lX11 -lm
19 LDR = ${FC} 
20 LIBSG = -L/usr/X11R6/lib -lGLU -lGL -lX11 -lm

cd surf/

edit Makefile and change it from

 46 depend: $(DEPEND)
 47     @ echo making dependencies...
 48     @ echo ' ' > makedep
 49     @ makedepend $(INCLUDE) -f makedep $(DEPEND)

to

 46 depend: $(DEPEND)
 47     @ echo making dependencies...
 48     @ echo ' ' > makedep
 49     @ $(CC) $(INCLUDE) -M $(DEPEND) > makedep

Save and go back up one level, and run make:
 cd ../
 make

You're pretty much done.

I like putting things in /opt, so
sudo mkdir /opt/molden
sudo chown $USER /opt/molden
cp ~/tmp/molden5.0/* -R /opt/molden

stick
export PATH=$PATH:/opt/molden
in your ~/.bashrc

Type
molden
to run

Molden can read output files from gamess -- still exploring the exact capabilities, but e.g the convergence information can be accessed:


and you can get nifty contour plots of the electron density of orbitals etc.


Error:

If you don't edit the surf/Makefile as shown above you'll get

make[1]: Leaving directory `/home/me/tmp/molden5.0/ambfor'
make -C surf depend
make[1]: Entering directory `/home/me/tmp/molden5.0/surf'
making dependencies...
make[1]: makedepend: Command not found
make[1]: *** [depend] Error 127
make[1]: Leaving directory `/home/me/tmp/molden5.0/surf'

20 September 2012

243. My own personal benchmarks for NWChem, gromacs with atlas, openblas, acml on AMD and intel

Update: you can compile against acml on intel as well, and against mkl on amd. Still need to do some performance testing to see how well it works. The artificial penalty of running mkl on AMD is well-publicised and led to a lawsuit, but I don't know how acml performs on mkl.


The title says it all, really. Since I'm back to exploring ways of improving performance for my little cluster I figured I'd break this out as a separate post. Most of this data was found here before: http://verahill.blogspot.com.au/2012/09/new-compute-node-using-amd-fx-8150.html

All units are running up-to-date debian testing (wheezy).

Configuration:
Boron (B): Phenom II X6 2.8 GHz, 8Gb RAM (2.8*6=16.8 GFLOPS predicted)
Neon (Ne): FX-8150 X8 3.6 GHz, 16 Gb RAM (3.6*8=28.8 GFLOPS predicted (int), 3.6*4=14.4 GFLOPS (fpu))
Tantalum (Ta): Quadcore i5-2400 3.1 GHz, 8 Gb RAM (3.1*4=12.4 GFLOPS predicted)
Vanadium (V):  Dual socket 2x Quadcore Xeon X3480 3.06 GHz, 8Gb. CentOS (ROCKS 5.4.3)/openblas.

Results

Gromacs --double (1 ns 6x6x6 nm tip4p water box; dynamic load balancing, double precision, 500k steps)
B  :  10.662 ns/day (11.8  GFLOPS, runtime 8104 seconds)***
B  :    9.921 ns/day ( 10.9 GFLOPS, runtime 8709 seconds)**
Ne:  10.606 ns/day (11.7  GFLOPS, runtime 8146 seconds) *
Ne:  12.375 ns/day (13.7  GFLOPS, runtime 6982 seconds)**
Ne:  12.385 ns/day (13.7  GFLOPS, runtime 6976 seconds)****
Ta:  10.825 ns/day (11.9  GFLOPS, runtime 7981 seconds)***
V :   10.560 ns/dat (11.7  GFLOPS, runtime 8182 seconds)***
*no external blas/lapack.
**using ACML libs
*** using openblas
**** using ATLAS

Gromacs --single (1 ns 6x6x6 nm tip4p water box; dynamic load balancing, single precision, 500 k steps)
B  :   17.251 ns/day (19.0 GFLOPS, runtime 5008 seconds)***
Ne:   21.874 ns/day (24.2 GFLOPS, runtime  3950 seconds)**
Ne:   21.804 ns/day (24.1 GFLOPS, runtime 3963  seconds)****
Ta:   17.345 ns/day (19.2 GFLOPS, runtime  4982 seconds)***
V :   17.297 ns/day (19.1 GFLOPS, runtime 4995 seconds)***
*no external blas/lapack.
**using ACML libs
*** using openblas
**** using ATLAS

NWChem (opt biphenyl cation, cp-md/pspw):
B  :   5951 seconds**
B  :   4084 seconds ***
B  :   5782 seconds ***xy
Ne:    3689 seconds**
Ta :   4102 seconds***
Ta :   4230 seconds***xy
V :    5396 seconds***

*no external blas/lapack.
**using ACML libs
*** using openblas
x Reconfigured using getmem.nwchem

NWChem (opt biphenyl cation, geovib, 6-31G**/ub3lyp):
B  :  2841 seconds **
B  :  2410 seconds***
B  :  2101 seconds ***x
B  :  2196 seconds ***xy
Ne: 1665 seconds **
Ta : 1785 seconds***
Ta : 1789 seconds***xy
V  : 2600 seconds***

*no external blas/lapack.
**using ACML libs
*** using openblas
x Reconfigured using getmem.nwchem
y NWChem 6.1.1

A Certain Commercial Ab Initio Package (Freq calc of pre-optimised H14C19O3 at 6-31+G*/rb3lyp):
B  :    2h 00 min (CPU time 10h 37 min)
Ne:   1h 37 min (CPU time: 11h 13 min)
Ta:   1h 26 min (CPU time: 5h 27 min)
V  :   2h 15 min (CPU time 15h 50 min)
Using precompiled binaries.


Gamess:
(I'm still working on learning how to run gamess efficiently, so take these values with a huge saucer of salt for now). bn.inp does a geometry optimisation of a biphenyl cation (mult 2) at ub3lyp/6-31G**. bn.inp has no $STATPT card while bn3.inp does and it makes a huge difference -- but is this because it does 20 steps (nsteps=20), then kills the run? The default is 50 steps and it does seem like all the runs do the maximum number of steps, then exit.

 Again, still learning. See below for input files. Will fix this post as I learn what the heck I'm doing. The relative run times on each node are still comparable though, but just don't use the numbers to compare the run speed of e.g. nwchem vs gamess.

Gamess using bn.inp with atlas
B:    9079 seconds
Ne: 7252 seconds
Ta:  9283 seconds

Gamess using bn.inp with openblas
B:   9071 seconds
Ta: 9297 seconds

Gamess using bn.inp with acml
Ne: 7062 seconds

Gamess using bn3.inp with atlas. 
B: 4016 seconds
Ne: 3162 seconds
Ta: 4114 seconds

MPQC:
Here I've used the version in the debian repos. I've created a hostfile
neon slots=8 max_slots=8
tantalum slots=4 max_slots=4
boron slots=6 max_slots=6

and then just looked at changing the order and slots assignment as well as total number of cores assigned using mpirun.

Simple test case looking at number of cores/distribution:
n cores:  Seconds: Configuration(cores,exec nodes)
4    :   11   : 4(Ta)
4    :   17   : 4(Ne)
4    :   17   : 4(B)
4    :   42   : 2(Ta)+2(B)
6    :   12   : 6(B)
6    :   13   : 6(Ne)
6    :   74   : 2(Ta)+2(B)+2(Ne)
8    :   12   : 8(Ne)
10  :   43   : 4(Ta)+6(B)
12  :   47   : 4(Ta)+8(Ne)
14  :   55   : 6(B)+8(Ne)
18  :   170 :  4(Ta)+6(B)+8(Ne)

My beowulf cluster doesn't seem to be much of a super computer. All in all, this looks like a pretty good argument in favour of upgrading to infiniband...


bn.inp:
 $CONTRL 
COORD=CART UNITS=ANGS scftyp=uhf dfttyp=b3lyp runtyp=optimize 
ICHARG=1 MULT=2 maxit=100
$END
 $system mwords=2000 $end
 $BASIS gbasis=n31 ngauss=6 ndfunc=1 npfunc=1 $END
 $guess guess=huckel $end

 $DATA
biphenyl
C1
C      6.0      0.0000000000   -3.5630100000    0.0000000000 
C      6.0     -1.1392700000   -2.8592800000   -0.3938400000 
C      6.0     -1.1387900000   -1.4654500000   -0.3941500000 
C      6.0      0.0000000000   -0.7428100000    0.0000000000 
C      6.0      1.1387900000   -1.4654500000    0.3941500000 
C      6.0      1.1392700000   -2.8592800000    0.3938400000 
C      6.0      0.0000000000    0.7428100000    0.0000000000 
C      6.0      1.1387900000    1.4654500000   -0.3941500000 
C      6.0      1.1392700000    2.8592800000   -0.3938400000 
C      6.0     -1.1387900000    1.4654500000    0.3941500000 
C      6.0      0.0000000000    3.5630100000    0.0000000000 
C      6.0     -1.1392700000    2.8592800000    0.3938400000 
H      1.0      0.0000000000   -4.6489600000    0.0000000000 
H      1.0     -2.0282700000   -3.3966200000   -0.7116100000 
H      1.0     -2.0214800000   -0.9282700000   -0.7279300000 
H      1.0      2.0282700000   -3.3966200000    0.7116100000 
H      1.0      2.0282700000    3.3966200000   -0.7116100000 
H      1.0     -2.0214800000    0.9282700000    0.7279300000 
H      1.0      0.0000000000    4.6489600000    0.0000000000 
H      1.0     -2.0282700000    3.3966200000    0.7116100000 
H      1.0      2.0214800000    0.9282700000   -0.7279300000 
H      1.0      2.0214800000   -0.9282700000    0.7279300000 
 $END


bn3.inp:
$CONTRL 
COORD=CART UNITS=ANGS scftyp=uhf dfttyp=b3lyp runtyp=optimize 
ICHARG=1 MULT=2 maxit=100
$END
 $system mwords=2000 $end
 $BASIS gbasis=n31 ngauss=6 ndfunc=1 npfunc=1 $END
 $STATPT OPTTOL=0.0001 NSTEP=20 HSSEND=.TRUE. $END
 $guess guess=huckel $end

 $DATA
biphenyl
C1
C      6.0      0.0000000000   -3.5630100000    0.0000000000 
C      6.0     -1.1392700000   -2.8592800000   -0.3938400000 
C      6.0     -1.1387900000   -1.4654500000   -0.3941500000 
C      6.0      0.0000000000   -0.7428100000    0.0000000000 
C      6.0      1.1387900000   -1.4654500000    0.3941500000 
C      6.0      1.1392700000   -2.8592800000    0.3938400000 
C      6.0      0.0000000000    0.7428100000    0.0000000000 
C      6.0      1.1387900000    1.4654500000   -0.3941500000 
C      6.0      1.1392700000    2.8592800000   -0.3938400000 
C      6.0     -1.1387900000    1.4654500000    0.3941500000 
C      6.0      0.0000000000    3.5630100000    0.0000000000 
C      6.0     -1.1392700000    2.8592800000    0.3938400000 
H      1.0      0.0000000000   -4.6489600000    0.0000000000 
H      1.0     -2.0282700000   -3.3966200000   -0.7116100000 
H      1.0     -2.0214800000   -0.9282700000   -0.7279300000 
H      1.0      2.0282700000   -3.3966200000    0.7116100000 
H      1.0      2.0282700000    3.3966200000   -0.7116100000 
H      1.0     -2.0214800000    0.9282700000    0.7279300000 
H      1.0      0.0000000000    4.6489600000    0.0000000000 
H      1.0     -2.0282700000    3.3966200000    0.7116100000 
H      1.0      2.0214800000    0.9282700000   -0.7279300000 
H      1.0      2.0214800000   -0.9282700000    0.7279300000 
 $END

19 September 2012

242. Briefly: Compiling NWChem 6.1.1 with Python on Debian Testing (Wheezy)

Back at the end of June a minor version of NWChem (bug fixes) was released.

There isn't much difference between compiling 6.1.1 and 6.1. Mainly, the difference is in what line to edit for python compatibility (NWChem 6.1 here:http://verahill.blogspot.com.au/2012/05/building-nwchem-61-on-debian.html )

1. Install dev packages
sudo apt-get install libopenmpi-dev openmpi-bin python2.7-dev zlib1g-dev libssl-dev

2. Compile openblas  or download e.g. acml.
(compiles fine, but doesn't run, with atlas)

3. NWchem goodness:

sudo mkdir /opt/nwchem
sudo chown $USER /opt/nwchem
cd /opt/nwchem
wget http://www.nwchem-sw.org/images/Nwchem-6.1.1-src.2012-06-27.tar.gz
(or go here http://www.nwchem-sw.org/index.php/Download)

tar xvf Nwchem-6.1.1-src.2012-06-27.tar.gz
cd nwchem-6.1.1-src/src/config/

Edit makefile.h and hange (line numbers are just for convenience -- don't add them)
1956 #   EXTRA_LIBS += -ltk -ltcl -L/usr/X11R6/lib -lX11 -ldl
1957      EXTRA_LIBS +=    -lnwcutil  -lpthread -lutil -ldl
1958   LDOPTIONS = -Wl,--export-dynamic

to
1956 #   EXTRA_LIBS += -ltk -ltcl -L/usr/X11R6/lib -lX11 -ldl
1957      EXTRA_LIBS +=    -lnwcutil  -lpthread -lutil -ldl -lssl -lz
1958   LDOPTIONS = -Wl,--export-dynamic

cd to /opt/nwchem/nwchem-6.1.1-src/ Create a file called buildconf.sh with the following content:
export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=`pwd`
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export PYTHONVERSION=2.7
export PYTHONHOME=/usr
export BLASOPT="-L/opt/openblas/lib -lopenblas"
#export BLASOPT="-L/opt/acml/acml5.2.0/gfortran64_int64/lib -lacml"
#export BLASOPT="-L/opt/ATLAS/lib -lsatlas -ltatlas"
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export LIBRARY_PATH=$LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/openblas/lib
#export LIBRARY_PATH=$LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/acml/acml5.2.0/gfortran64_int64/lib
#export LIBRARY_PATH=$LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/ATLAS/lib
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran 1> make.log 2>make.err
export FC=gfortran
cd ../contrib
./getmem.nwchem

Start the compilation:
time sh buildconf.sh 

 On a quadcore i5-2400 it took 18 minutes.

241. pKa, part 3: ccCA in NWChem. Doing something wrong?

First of all, I'm having problems reproducing the output from 'task ccca' by following the methods described in J. Chem. Theory Comput 2008, 4, 328-334 (scaling 0.9854) or Mol. Phys. 2009,107(8-12),1107-1121. The discrepancies are the energies reported for the MP2/cc-pVTZ-DK and CCSD(T)/cc-PVTZ which leads to a difference in calculated relativistic and correlation corrections. More about that at some other time.

Here's using ccCA in NWChem on acetic acid/acetate and formic acid/formate.
More about how it works in another post.

Basically, the way I am using it the results are very, very poor with  ccCA. All I can think is that I must be doing something wrong.


INPUT files 

Acetic acid input:

Title "aceticacid"
Start  aceticacid

echo

charge 0

geometry autosym units angstrom
 C     -0.312051     -1.36877     0.00000
 H     -0.929226     -1.55822     -0.878253
 H     -0.929226     -1.55822     0.878253
 H     0.548700     -2.02934     0.00000
 C     0.150590     0.0606620     0.00000
 O     -0.897092     0.922315     0.00000
 H     -0.521850     1.81528     0.00000
 O     1.29371     0.435169     0.00000
end

basis
* library "cc-pvtz"
end

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
  default
end

ccca
  optimize
end

task ccca

Acetate input:

Title "acetate_ccca"
Start  acetate_ccca
echo
charge -1

geometry autosym units angstrom
 C     -0.0311237     -1.36218     0.00000
 H     0.501926     -1.73727     0.878691
 H     0.501926     -1.73727     -0.878691
 H     -1.05131     -1.75101     0.00000
 C     -0.00500996     0.204086     0.00000
 O     1.14247     0.706045     0.00000
 O     -1.12049     0.771493     0.00000
end

basis
* library "cc-pvtz"
end

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
  default
end

ccca
   optimize
end

task ccca


Formic acid input:
Title "formicacid"
Start  formicacid

echo

charge 0

geometry autosym units angstrom
 C     0.410955     -0.132154     0.00000
 H     1.50430     -0.0475164     0.00000
 O     -0.134104     1.09718     0.00000
 H     -1.09846     0.988665     0.00000
 O     -0.203188     -1.15938     0.00000
end

basic
* library "cc-pvtz"
end

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
end

ccca
   optimize
end

task ccca


Formate input:
Title "formate"
Start  formate

echo

charge -1

geometry autosym units angstrom
 C     0.00000     0.00000     0.329396
 H     0.00000     0.00000     1.47310
 O     -1.13532     0.00000     -0.189103
 O     1.13532     0.00000     -0.189103
end

basis "ao basis" spherical print
* library "cc-pvtz"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
end

ccca
   optimize
end

task ccca


OUTPUT 

Acetic acid
 Temperature                      =   298.15K
 frequency scaling parameter      =   0.9889

 Zero-Point correction to Energy  =   38.155 kcal/mol  (  0.060805 au)
 Thermal correction to Energy     =   41.060 kcal/mol  (  0.065434 au)
 Thermal correction to Enthalpy   =   41.653 kcal/mol  (  0.066378 au)

 Total Entropy                    =   69.467 cal/mol-K
   - Translational                =   38.179 cal/mol-K (mol. weight =  60.0211)    - Rotational                   =   23.830 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    7.458 cal/mol-K

 Cv (constant volume heat capacity) =   14.439 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    8.480 cal/mol-K


 ccCA: calculations done, now printing results
 
 ccCA-P  reference energy =   -228.82035086993045     
 ccCA-S3 reference energy =   -228.82800135561300     
 ccCA-S4 reference energy =   -228.82030423449530     
 ccCA-PS3 reference energy=   -228.82417611277174     
 DK correction            =  -0.13322049012506909     
 CCSD(T) correction       =   -4.8762979862686961E-002
 CV correction            =  -0.20936881324035994     
 ---------------------------
 Total ccCA-P   energy    =   -229.21170315315857     
 Total ccCA-S3  energy    =   -229.21935363884111     
 Total ccCA-S4  energy    =   -229.21165651772341     
 Total ccCA-PS3 energy    =   -229.21552839599985     
 
 Thermochemistry available:
            ZPE   =   6.0851792771826778E-002
 ccCA-P   E+ZPE   =  -229.15085136038675     
 ccCA-S3  E+ZPE   =  -229.15850184606930     
 ccCA-S4  E+ZPE   =  -229.15080472495160     
 ccCA-PS3 E+ZPE   =  -229.15467660322804     
 Wrote ccCA-P    energy to the RTDB 
 Leaving ccCA module...

 Task  times  cpu:     5565.6s     wall:     5650.7s

Acetate

 Temperature                      =   298.15K
 frequency scaling parameter      =   0.9889

 Zero-Point correction to Energy  =   29.591 kcal/mol  (  0.047157 au)
 Thermal correction to Energy     =   31.853 kcal/mol  (  0.050762 au)
 Thermal correction to Enthalpy   =   32.446 kcal/mol  (  0.051706 au)

 Total Entropy                    =   64.067 cal/mol-K
   - Translational                =   38.129 cal/mol-K (mol. weight =  59.0133)
   - Rotational                   =   23.739 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    2.199 cal/mol-K

 Cv (constant volume heat capacity) =   11.235 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    5.276 cal/mol-K

 ccCA: calculations done, now printing results
 
 ccCA-P  reference energy =   -228.25857936176124     
 ccCA-S3 reference energy =   -228.26625083689740     
 ccCA-S4 reference energy =   -228.25849407721080     
 ccCA-PS3 reference energy=   -228.26241509932930     
 DK correction            =  -0.13318127658752132     
 CCSD(T) correction       =   -4.4728554700242285E-002
 CV correction            =  -0.20921905251765338     
 ---------------------------
 Total ccCA-P   energy    =   -228.64570824556665     
 Total ccCA-S3  energy    =   -228.65337972070282     
 Total ccCA-S4  energy    =   -228.64562296101622     
 Total ccCA-PS3 energy    =   -228.64954398313472     
 
 Thermochemistry available:
            ZPE   =   4.7193435242008613E-002
 ccCA-P   E+ZPE   =  -228.59851481032464     
 ccCA-S3  E+ZPE   =  -228.60618628546081     
 ccCA-S4  E+ZPE   =  -228.59842952577421     
 ccCA-PS3 E+ZPE   =  -228.60235054789271     
 Wrote ccCA-P    energy to the RTDB 
 Leaving ccCA module...

 Task  times  cpu:     3859.1s     wall:     3910.2s


Formic acid

 Temperature                      =   298.15K
 frequency scaling parameter      =   0.9889

 Zero-Point correction to Energy  =   20.909 kcal/mol  (  0.033320 au)
 Thermal correction to Energy     =   22.902 kcal/mol  (  0.036497 au)
 Thermal correction to Enthalpy   =   23.495 kcal/mol  (  0.037441 au)

 Total Entropy                    =   59.329 cal/mol-K
   - Translational                =   37.387 cal/mol-K (mol. weight =  46.0055)    - Rotational                   =   21.008 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    0.934 cal/mol-K

 Cv (constant volume heat capacity) =    8.703 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    2.744 cal/mol-K

 ccCA: calculations done, now printing results

 ccCA-P  reference energy =   -189.56748775122853
 ccCA-S3 reference energy =   -189.57364633780318
 ccCA-S4 reference energy =   -189.56733835209894
 ccCA-PS3 reference energy=   -189.57056704451585
 DK correction            =  -0.11856238070660652
 CCSD(T) correction       =   -3.0831132609506540E-002
 CV correction            =  -0.16057296161548607
 ---------------------------
 Total ccCA-P   energy    =   -189.87745422616013
 Total ccCA-S3  energy    =   -189.88361281273478
 Total ccCA-S4  energy    =   -189.87730482703054
 Total ccCA-PS3 energy    =   -189.88053351944745

 Thermochemistry available:
            ZPE   =   3.3346398704552728E-002
 ccCA-P   E+ZPE   =  -189.84410782745556
 ccCA-S3  E+ZPE   =  -189.85026641403022
 ccCA-S4  E+ZPE   =  -189.84395842832598
 ccCA-PS3 E+ZPE   =  -189.84718712074289
 Wrote ccCA-P    energy to the RTDB
 Leaving ccCA module...

 Task  times  cpu:     1369.3s     wall:     1407.5s

Formate
 Temperature                      =   298.15K
 frequency scaling parameter      =   0.9889

 Zero-Point correction to Energy  =   12.385 kcal/mol  (  0.019737 au)
 Thermal correction to Energy     =   14.252 kcal/mol  (  0.022713 au)
 Thermal correction to Enthalpy   =   14.845 kcal/mol  (  0.023656 au)

 Total Entropy                    =   56.927 cal/mol-K
   - Translational                =   37.321 cal/mol-K (mol. weight =  44.9976)
   - Rotational                   =   19.229 cal/mol-K (symmetry #  =        2)
   - Vibrational                  =    0.377 cal/mol-K

 Cv (constant volume heat capacity) =    7.310 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    1.351 cal/mol-K


 ccCA: calculations done, now printing results
 
 ccCA-P  reference energy =   -189.01189831122844     
 ccCA-S3 reference energy =   -189.01808726799254     
 ccCA-S4 reference energy =   -189.01171261173857     
 ccCA-PS3 reference energy=   -189.01499278961049     
 DK correction            =  -0.11851294005154500     
 CCSD(T) correction       =   -2.6430727035545942E-002
 CV correction            =  -0.16057463040127118     
 ---------------------------
 Total ccCA-P   energy    =   -189.31741660871680     
 Total ccCA-S3  energy    =   -189.32360556548090     
 Total ccCA-S4  energy    =   -189.31723090922694     
 Total ccCA-PS3 energy    =   -189.32051108709885     
 
 Thermochemistry available:
            ZPE   =   1.9751889903778755E-002
 ccCA-P   E+ZPE   =  -189.29766471881302     
 ccCA-S3  E+ZPE   =  -189.30385367557713     
 ccCA-S4  E+ZPE   =  -189.29747901932316     
 ccCA-PS3 E+ZPE   =  -189.30075919719508     
 Wrote ccCA-P    energy to the RTDB 
 Leaving ccCA module...

Solvation energy

Solvation energy may seem easy to calculate, but difficult to calculate accurately using implicit methods, in particular for ions. I used the optimized structures from above, and then did a single-point COSMO (rsolv 0. Not ideal)  at RDFT(b3lyp)/cc-pVTZ/

Acetic acid: -8.59 kcal/mol
Acetate: -72.33 kcal/mol
Formic acid: -8.90
Formate: -72.59 kcal/mol
H+: -624.61 kcal/mol (lit. value)


Free energies:  
G: ccCA-P+(Hcorr-T*Scorr)
 G(acetic acid): -229.21170315315857*627.503+(41.653-298.15*69.467/1000)-8.59
 G(acetate): -228.64570824556665*627.503+(32.446-298.15*64.067/1000)-72.33
 G(formic acid): -189.87745422616013*627.503+(23.495-298.15*59.329/1000)-8.90
 G(formate):  -189.31741660871680*627.503+(14.845-298.15*56.927/1000)-72.59
 G(H+): -6.28 kcal/mol (lit. value) -264.61 (lit. value)=-270.89 kcal/mol


Results: Direct approach:
Don't forget to account for the standard state. R=1.9858775(34)×10−3 kcal/(K.mol)
DG(acetic/acetate)=G(acetate)+G(H+)-G(acetic)+RT*ln(1/24.46)=
(-228.64570824556665*627.503+(32.446-298.15*64.067/1000)-72.33-270.89)-(-229.21170315315857*627.503+(41.653-298.15*69.467/1000)-8.59)+1.9858775*298.15*log(1/24.46)/1000=
11.0435795929699 kcal/mol=46.2063370169861 kJ/mol =>
pKa=DG*log10(e)/RT=
11.0435795929699*log10(e)/(1.9858775*10**(-3)*298.15)
=8.1 (which is very bad -- it should be close to 4.75)

DG(formic/formate)=G(formate)+G(H+)-G(formic)-RT*ln(1/24.46)=
(-189.31741660871680*627.503+(14.845-298.15*56.927/1000)-72.59-270.89)-(-189.87745422616013*627.503+(23.495-298.15*59.329/1000)-8.90)+1.9858775*298.15*log(1/24.46)/1000=
7.01850845285312 kcal/mol=29.3654393667375 kJ/mol
pKa=5.1 (which is quite bad -- it should be close to 3.75)


Results: Isodesmic approach
From an older post:
"Assuming that we know that formic acid has a pKa of 3.75, then DG_solution=pKa*RT/log(e)=3.75*8.314*298.15/log10(e)/1000=21.404 kJ/mol. The reverse reaction is -21.404 kJ/mol."
That's about 5.116 kcal/mol

DG(acetate)+DG(formic)-(DG(acetic)+DG(formate))+DG(ref)=
((-228.64570824556665*627.503+(32.446-298.15*64.067/1000)-72.33)+(-189.87745422616013*627.503+(23.495-298.15*59.329/1000)-8.90))-((-229.21170315315857*627.503+(41.653-298.15*69.467/1000)-8.59)+(-189.31741660871680*627.503+(14.845-298.15*56.927/1000)-72.59))+5.116=
4.02507114014588 kcal/mol+5.116 kcal/mol =9.14107114014588 kcal/mol <=> pKa= 6.7
Which is better, but still not as good as here.

Using the E+zpe energies doesn't help much:
((-228.59851481032464*627.503+(32.446-298.15*64.067/1000)-72.33)+(-189.84410782745556*627.503+(23.495-298.15*59.329/1000)-8.90))-((-229.15085136038675*627.503+(41.653-298.15*69.467/1000)-8.59)+(-189.29766471881302*627.503+(14.845-298.15*56.927/1000)-72.59))+5.116=
9.10100587110175 kcal/mol <=> pKa=6.68

I really have no idea why the results are so bad when I had reasonable results with DFT/b3lyp/6-31++G** which should be worse than the E(CBS)+E(CC)+E(CV)+E(DK) approach for calculating electronic energies.

Solvation energies are a bit different and could explain some of the difference. Using the solvation energies from here I got:
((-228.64570824556665*627.503+(32.446-298.15*64.067/1000)-73.23)+(-189.87745422616013*627.503+(23.495-298.15*59.329/1000)-9.99))-((-229.21170315315857*627.503+(41.653-298.15*69.467/1000)-9.32)+(-189.31741660871680*627.503+(14.845-298.15*56.927/1000)-72.47))+5.116=
7.76107114008302 kcal/mol <=> pKa=5.69. Not 4.75, but closer.

Using rsolv 1.3 I get
((-228.64570824556665*627.503+(32.446-298.15*64.067/1000)-71.09)+(-189.87745422616013*627.503+(23.495-298.15*59.329/1000)-6.37))-((-229.21170315315857*627.503+(41.653-298.15*69.467/1000)-6.53)+(-189.31741660871680*627.503+(14.845-298.15*56.927/1000)-71.90))+5.116=
10.1610711401063 kcal/mol which is bad.


More thinking.
  This paper says that the gas phase free energy for the deprotonation of acetic acid should be 341.1 kcal/mol
(-228.64570824556665*627.503+(32.446-298.15*64.067/1000)-6.82)-(-229.21170315315857*627.503+(41.653-298.15*69.467/1000))=340.75 kca/mol
We're within 1 kcal/mol 

The same paper states 338.5 kcal/mol for formic acid:
(-189.31741660871680*627.503+(14.845-298.15*56.927/1000)-6.82)-(-189.87745422616013*627.503+(23.495-298.15*59.329/1000))=336.67 kca/mol

For our direct solution phase pKa calculation formic acid was off by about 1.9 kcal/mol which is similar to the error here.

18 September 2012

240. Harmonic frequency scaling in NWChem

...is difficult to find information about, but apparently it IS possible:
http://www.emsl.pnl.gov/docs/nwchem/nwchem-support/2012/08/0021._NWCHEM_FW:_BOUNCE_nwchem-users_emsl.pnl

Example input:

Title "acetate"

Start  acetate

echo

charge -1

geometry autosym units angstrom
 C     0.0191498     0.0215213     0.0191498
 H     -0.621688     -0.620669     0.658428
 H     0.658428     -0.620669     -0.621688
 H     -0.628475     0.649316     -0.628475
 C     0.881900     0.883055     0.881900
 O     1.74905     0.315740     1.74905
 O     0.799516     2.22959     0.799516
end

basis
* library "cc-pvtz"
end

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
end

set vib:scalefreq  0.9854

task dft optimize
task dft freq

which gives:

 Temperature                      =   298.15K
 frequency scaling parameter      =   0.9854

 Zero-Point correction to Energy  =   29.486 kcal/mol  (  0.046990 au)
 Thermal correction to Energy     =   31.752 kcal/mol  (  0.050601 au)
 Thermal correction to Enthalpy   =   32.345 kcal/mol  (  0.051544 au)

 Total Entropy                    =   64.086 cal/mol-K
   - Translational                =   38.129 cal/mol-K (mol. weight =  59.0133)
   - Rotational                   =   23.739 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    2.218 cal/mol-K

 Cv (constant volume heat capacity) =   11.268 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    5.309 cal/mol-K

c.f.

 Temperature                      =   298.15K
 frequency scaling parameter      =   1.0000

 Zero-Point correction to Energy  =   29.923 kcal/mol  (  0.047686 au)
 Thermal correction to Energy     =   32.173 kcal/mol  (  0.051272 au)
 Thermal correction to Enthalpy   =   32.766 kcal/mol  (  0.052215 au)

 Total Entropy                    =   64.008 cal/mol-K
   - Translational                =   38.129 cal/mol-K (mol. weight =  59.0133)
   - Rotational                   =   23.739 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    2.141 cal/mol-K

 Cv (constant volume heat capacity) =   11.132 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    5.173 cal/mol-K

239. Sun GridEngine: resetting queue status on node

I occasionally run into problems with space during a run on my cluster, which causes the job to fail and the node to be marked as unavailable:

qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
eight.q@neon                   BIP   0/0/8          0.45     lx26-amd64    
---------------------------------------------------------------------------------
five.q@boron                   BIP   0/0/5          6.01     lx26-amd64    
---------------------------------------------------------------------------------
six.q@boron                    BIP   0/6/6          6.01     lx26-amd64    
    788 0.75000 submit__la user         r     09/07/2012 18:36:56     6        
---------------------------------------------------------------------------------
two.q@beryllium                BIP   0/0/2          0.24     lx26-amd64    
---------------------------------------------------------------------------------
four.q@tantalum                BIP   0/0/4          0.05     lx26-amd64    E
---------------------------------------------------------------------------------
three.q@beryllium              BIP   0/0/3          0.24     lx26-amd64    
---------------------------------------------------------------------------------
main.q@beryllium               BIP   0/0/1          0.24     lx26-amd64    
---------------------------------------------------------------------------------
main.q@boron                   BIP   0/0/1          6.01     lx26-amd64    
---------------------------------------------------------------------------------
main.q@tantalum                BIP   0/0/1          0.05     lx26-amd64    

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    789 0.67310 zoli.qsub  user         qw    09/09/2012 10:00:35     6        
    802 0.60527 submit__bi user         qw    09/10/2012 20:45:24     6        
    803 0.60525 submit__bi user         qw    09/10/2012 20:46:00     6        
    927 0.25071 submit__ac user         qw    09/18/2012 08:24:00     4        
    928 0.25000 submit__ac user         qw    09/18/2012 08:45:52     4  

Before you do anything else, free up space and consider moving your scratch dir to a different/separate disk.

Since I keep forgetting how to reset it, here it is -- as a SGE admin do:
 /usr/bin/qmod -c four.q@tantalum
me@beryllium changed state of "four.q@tantalum" (no error)
And now everything is good:

qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
eight.q@neon                   BIP   0/0/8          0.25     lx26-amd64    
---------------------------------------------------------------------------------
five.q@boron                   BIP   0/0/5          5.91     lx26-amd64    
---------------------------------------------------------------------------------
six.q@boron                    BIP   0/6/6          5.91     lx26-amd64    
    788 0.75000 submit__la user         r     09/07/2012 18:36:56     6        
---------------------------------------------------------------------------------
two.q@beryllium                BIP   0/0/2          0.44     lx26-amd64    
---------------------------------------------------------------------------------
four.q@tantalum                BIP   0/4/4          0.17     lx26-amd64    
    927 0.25071 submit__ac user         r     09/18/2012 11:01:26     4        
---------------------------------------------------------------------------------
three.q@beryllium              BIP   0/0/3          0.44     lx26-amd64    
---------------------------------------------------------------------------------
main.q@beryllium               BIP   0/0/1          0.44     lx26-amd64    
---------------------------------------------------------------------------------
main.q@boron                   BIP   0/0/1          5.91     lx26-amd64    
---------------------------------------------------------------------------------
main.q@tantalum                BIP   0/0/1          0.17     lx26-amd64    

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    789 0.67310 zoli.qsub  user         qw    09/09/2012 10:00:35     6        
    802 0.60527 submit__bi user         qw    09/10/2012 20:45:24     6        
    803 0.60525 submit__bi user         qw    09/10/2012 20:46:00     6        
    928 0.25000 submit__ac user         qw    09/18/2012 08:45:52     4   

17 September 2012

238. Calculating pKa, part 2: CBS extrapolation basics

I'm typing this up as I'm learning. Be prepared that there may be outright errors, fuzzy thinking and misunderstood concepts in this post. Use it as an inspiration for learning about Complete Basis Set extrapolation -- not as an authoritative guide. What follows is just what I think I've understood. It's a short post since I'll just put it here in order that I can refer to it later.

The general idea
The basic idea behind Complete Basis Set extrapolation seems to be that as you make your basis sets larger and larger by adding Zeta functions to your valence orbitals (making your valence orbitals more 'flexible') the energies you get start to approach those you would get with an infinitely large or complete (and thus 'true') basis set. Exactly how quickly this approach (exponential?) occurs is a matter for debate, so there's a number of ways of extrapolating it.

In practical terms, what seems to be done is:
1. The structure of a molecule is first optimised using a specific level of theory (e.g. MP2/cc-aug-pvdz) in the gas phase. This structure is used for all subsequent calculations.
2. A single point calculation at MP2/aug-cc-pVDZ is done. The electronic energy is recorded.
3. A single point calculation at MP2/aug-cc-pVTZ is done. The electronic energy is recorded.
4. A single point calculation at MP2/aug-cc-pVQZ is done. The electronic energy is recorded.
5. A single point calculation at MP2/aug-cc-pV5Z is done. The electronic energy is recorded.
etc.

where D=double(2), T=triple (3), Q=Quadruple (4)

Using MP2 and the Dunning-Knoll series of correlation consistent basis sets, for formate I got
2   -188.772348824917 (MP2/aug-cc-pVDZ)
3   -188.930859478806 (MP2/aug-cc-pVTZ)
4   -188.984671946038 (MP2/aug-cc-pVQZ)


Which looks like this:
Not everyone will appreciate the default choice of green and red by gnuplot...
The fitted line is MP2energy=1*exp(-0.599028936687644*Zetafunctions)-189.082170648532

Plotting an exponential with three unknown based only on three points is obviously poor form, but the point is fairly clear.

The extrapolated energy is -189.082170648532 hartree. For practical use you will need to obtain enthalpy/entropy corrections and solvation energies using a set method. Also, there are normally corrections for unpaired electrons etc.

The Peterson Scheme:
 (Peterson, K. A.; Woon, D. E.; Dunning, T. H., Jr. J. Chem. Phys. 1994100, 7410, link)

Instead of a straight A*exp(-B*x)+C fit you do E(x)=A+B*exp(-(x-1))+C*exp(-(x-1)**2), where A is the CBS energy.

So, using gnuplot we can put our energies in 'cbs.dat':
2 -188.772348824917
3 -188.930859478806
4 -188.984671946038

and fit it in gnuplot using:

set xrange [0:5]
f(x)=A+B*exp(-(x-1))+C*exp(-(x-1)**2)
fit f(x) 'cbs.dat' u 1:2 via A,B,C
plot 'cbs.dat' u 1:2, f(x) lc 3



which gives us a CBS energy of -189.016 Hartree (c.f. -189.082 using a simple exponential).

The basic underlying principle behind CBS is thus pretty clear even to someone who's not skilled in the art of computational chemistry.

The difficulties seem to be:
1. What basis set family to use (cc-pVXZ ?)
2. Whether to use diffuse/polarisation/extra orbital functions (aug, p, d+)?
3.  What level to do solvation calc on (DFT, MP2; COSMO, CPCM)?
4. What level to do structural optimisation at?
5. What level to do frequency calculation at?

Remember that a great many basis sets haven't been parametrised for elements beyond Ar.

Also, as far as solvation calculations are concerned an issue seems to be that the correct way to calculate solvation energy seems to be using gas phase structures -- and NOT structures optimised using a solvation model. This can cause issues if the gas phase and solution phase conformations are considerably different.

16 September 2012

237. Briefly: Packet corrupt during ssh sessions

Whenever I'm using two subnets for my cluster I seems to be having problems with:
Corrupted MAC on input.
Disconnecting: Packet corrupt
It particularly happens when there's a lot of information being passed to the screen. It's a right killer when you're compiling on a remote system. However, while I've been able to get around that by running a GNU Screen session on the remote box it was time to solve it.\


I googled and found:
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/60764


The subnet I'm having issues with is operation across a switch. One of the computers on the network is defined as the gateway and I tend to have problems when connecting from it to the other computers on the network.
The gateway server has two interfaces, eth0 which is connected to a router which is connected to the outside world, and eth1 which is connected to the local subnet I'm having problems with.

The fix has been as as simple as
sudo apt-get install ethtool
sudo ethtool -K eth1 rx off tx off

I only had to run this on the gateway box and so far I've had no issues. Depending on how you're managing your network interfaces (i.e. wicd, network-manager, /etc/networking/*) you may want to add it to the post-up section of your /etc/network/interfaces:
post-up ethtool -K eth1 rx off tx off

Links to this post:
http://winscp.net/forum/viewtopic.php?t=12469

14 September 2012

236. Calculating pKa, part 1:Example (attempt) of an isodesmic reactions in NWChem

Back to learning about computational approaches to chemistry. The usual warnings apply: why would you trust anything that I say about anything? I'm writing anonymously, and I may misunderstand things at times. So make sure that you compare what I write with that of other sources and make up your own mind.

Anyway, I found a fairly detailed presentation in which they were using Gaussian 98 here: https://www.uow.edu.au/~adamt/Trevitt_Research/Links_files/pKa%20workshop%20slides.pdf


While that should be ok to reproduce, it's not that straightforward to do even with Gaussian, since G09 and G03 don't report solvation parameters in the same way (or detail) as G98.

I'm also a lot keener on NWChem than Gaussian for various reasons, not least that it's 'free' (both libre and gratis) while Gaussian inc. has been accused of doing somewhat unfriendly things in the name of protecting their business interests.

See here and here for an example, and then here for a rebuttal from Gaussian inc. I know that EMSL/Pacific Northwest National Lab that develop NWChem and ECCE are prohibited from using Gaussian since they are considered as being competitors.


Back to science.

Our test example will be acetate, and we'll use formic acid to correct our results.
The fact that this post is very long is due to the amount of detail supplied -- I prefer to show some of the more obvious things so that people can learn from what I post -- and I learn by writing the post.

But first let's just do everything using direct methods.

We work with a thermochemical cycle:

IF we can't calculate the DG_solution directly (i.e. too expensive) we can optimise our structures in the gas phase, and then calculate the solvation energy for those structures.

Then DG_sol=DG_gas+DG_solvation(B)-DGsolvation(A).
(more generally sum of DG_solv(prod) - sum of DG_solv(reactants)).


1. pKa of Acetic acid using direct methods

We can either do
H3CCOOH -> H3COO- + H+
or
H3CCOOH + H2O -> H3COO- + H3O+

Steps:
Optimise acetic acid and acetate in the gas phase and do frequency calculations to get the enthalpy and entropy. Then use the gas phase structures and do single point calculations using COSMO to get the electrostatic solvation energies. Finally, use standard state corrections.

A. Optimise acetic acid in the gas phase and do frequency calculation

Title "aceticacid_gas"
Start  aceticacid_gas
echo
charge 0

geometry autosym units angstrom
 C     0.0402340     0.0308110     0.0402340
 H     -0.600803     -0.611482     0.679201
 H     0.679201     -0.611482     -0.600803
 H     -0.607055     0.659335     -0.607055
 C     0.903928     0.890481     0.903928
 O     0.814831     2.23989     0.814831
 O     1.78275     0.299052     1.78275
 H     2.25438     1.03276     2.25438
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
  default
end

task dft optimize
task dft freq

which gives

         Total DFT energy =     -229.102415550663
      One electron energy =     -550.833155571059
           Coulomb energy =      230.555870626149
    Exchange-Corr. energy =      -29.497734561879
 Nuclear repulsion energy =      120.672603956126

 Numeric. integr. density =       31.999999816803

     Total iterative time =      3.6s
and
 Temperature                      =   298.15K
 frequency scaling parameter      =   1.0000

 Zero-Point correction to Energy  =   38.683 kcal/mol  (  0.061646 au)
 Thermal correction to Energy     =   41.568 kcal/mol  (  0.066243 au)
 Thermal correction to Enthalpy   =   42.160 kcal/mol  (  0.067186 au)

 Total Entropy                    =   69.198 cal/mol-K
   - Translational                =   38.179 cal/mol-K (mol. weight =  60.0211)
   - Rotational                   =   23.855 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    7.164 cal/mol-K

 Cv (constant volume heat capacity) =   14.327 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    8.368 cal/mol-K

So that G=-229.102415550663*(627.503 kcal/Hartree)+(42.160 kcal/mol-298.15*(69.198 cal/molK)/1000)=-1.4372e+05 kcal/mol

B. Optimise acetate in the gas phase and do frequency calculation

Title "acetate_gas"

Start  acetate_gas

echo

charge -1

geometry autosym units angstrom
 C     0.0405721     0.0285481     0.0405721
 H     -0.601438     -0.613690     0.678620
 H     0.678620     -0.613690     -0.601438
 H     -0.605857     0.658809     -0.605857
 C     0.904975     0.886806     0.904975
 O     0.825186     2.23364     0.825186
 O     1.77103     0.316179     1.77103
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken 
end

driver
  default
end

task dft optimize
task dft freq                     

which gives

         Total DFT energy =     -228.540046314754
      One electron energy =     -539.474204198295
           Coulomb energy =      229.063209931484
    Exchange-Corr. energy =      -29.339040486703
 Nuclear repulsion energy =      111.209988438759

 Numeric. integr. density =       32.000000595562

     Total iterative time =      2.9s

and

 Temperature                      =   298.15K
 frequency scaling parameter      =   1.0000

 Zero-Point correction to Energy  =   30.023 kcal/mol  (  0.047845 au)
 Thermal correction to Energy     =   32.271 kcal/mol  (  0.051427 au)
 Thermal correction to Enthalpy   =   32.863 kcal/mol  (  0.052371 au)

 Total Entropy                    =   64.022 cal/mol-K
   - Translational                =   38.129 cal/mol-K (mol. weight =  59.0133)
   - Rotational                   =   23.766 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    2.127 cal/mol-K

 Cv (constant volume heat capacity) =   11.112 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    5.153 cal/mol-K
so that G=-228.540046314754*(627.503 kcal/Hartree)+(32.271 kcal/mol-298.15*(64.022 cal/molK)/1000)=-1.4338e+05 kcal/mol

Putting A and B together: (-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))=344.54 kcal/mol

We haven't accounted for solvation or the proton yet.

C. Solvation of acetic acid

Title "aceticacid_solvation"
Start  aceticacid_solvation

echo

charge 0

geometry autosym units angstrom
 C     -0.313400     -1.37257     0.00000
 H     -0.932151     -1.56367     -0.882188
 H     -0.932151     -1.56367     0.882188
 H     0.551887     -2.03461     0.00000
 C     0.149260     0.0607660     0.00000
 O     1.30165     0.439500     0.00000
 O     -0.897630     0.927778     0.00000
 H     -0.523912     1.82535     0.00000
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

cosmo
end

task dft energy

which gives
                  COSMO solvation results
                  -----------------------

                 gas phase energy =      -229.1024156124
                 sol phase energy =      -229.1172649438
 (electrostatic) solvation energy =         0.0148493315 (    9.32 kcal/mol)

D. Solvation of acetate

Title "acetate_solvation"
Start  acetate_solvation

echo

charge -1

geometry autosym units angstrom
 C     -0.0308736     -1.36399     0.00000
 H     0.503418     -1.74042     0.882388
 H     0.503418     -1.74042     -0.882388
 H     -1.05531     -1.75261     0.00000
 C     -0.00485953     0.199667     0.00000
 O     -1.12642     0.778065     0.00000
 O     1.14855     0.713544     0.00000
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

cosmo
end

task dft energy

which gives
                  COSMO solvation results
                  -----------------------
  
                 gas phase energy =      -228.5400463128
                 sol phase energy =      -228.6567490452
 (electrostatic) solvation energy =         0.1167027324 (   73.23 kcal/mol)


Putting A, B and solvation energies together: 
[(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))]-[73.23-9.32]=344.54-63.910=280.63 kcal/mol

E. The Proton
If you try to do any calculations on an isolated proton you get and SCFE of zero, and you won't do much better in terms of thermochemical data. Yet, monoatomic gases obviously still posses entropy and enthalpy. Instead, the document I cite above uses the ideal gas partition function for ideal monoatomic gases which gives a value of 6.28 kcal/mol for the free energy of a proton.  The last reference states that the free energy for a proton in the gas phase is experimentally determined to be -6.28 kcal/mol and that the free energy of hydration is -264.61 kcal/mol (experimental).

[(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-6.28-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))]-[73.23+264.61-9.32]=338.26-328.52=9.74 kcal/mol =40.752 kJ/mol
We're not done yet though -- make sure to continue to 'F. Standard States'

F. Standard States

We know that

DG=DG^0+RT ln (Q).

We have a bit of a problem. We're doing calculations in the gas phase (pressure) but looking at predicting solution values (concentration). Also, I don't fully get it yet, so my explanation is probably a bit fuzzy.

 So if A+B-> C+D we get Q=(C*D)/(A*B) but for A -> C+D we get Q=(C*D)/(A) or (1 bar x1 bar/1 bar)= 1 bar = 101350 Pa => nRT/P=1*8.314*298.15/101350=0.024458 m3= 24.46 L. The concentration of each species is 1 mol/bar which in volume terms means 1/24.46 L.

And the A -> C + D situation is what we have if we look at
HA -> H+ + A-

So,
Q=((1/24.46)*(1/24.46)/(1/24.46))=1/2.4.46 which gives
DG=DG^0-RTln(24.46)=DG^0-7924.9 J/mol

Thus, we need to correct for the standard state: 40.752 -7924.9/1000=32.827 kJ/mol

G. Calculating pKa

Since (in solution so that concentrations are 1 M)
DG=DG^0+RTln(K), where K=([H3COO][H+])/([H3COOH])
DG=DG^0+(RT ln([HCOO]/[H3COOH])+RTln (10^(-pH)), where RT ln([HCOO]/[H3COOH])=RTln(1/1)=0 so
DG=DG^0+RTln (10^(-pH))=DG^0+RT log (10^(-pH)/log(e)=DG^0-(RT/log(e)) * pH
Which for equilbrium, where pH=pKa and DG=0, turns into
pKa=DG^0*log(e)/RT


pKa= DG*log(e)/(RT)=(32.827*1000)*log(e)/(8.314*298.15)=5.75


Not that great as predictions go (exp: pKa=4.75). Looking at some of the literature one error lies in the size of the solvation energies. Possibly one should tune the parameters used in the COSMO.

2. Isodemic reaction/correction

Using formic acid
This approach is based on 1) the similarity between two compounds and 2) us knowing the DG_solution parameter for one of them.

Assuming that we know that formic acid has a pKa of 3.75, then DG_solution=pKa*RT/log(e)=3.75*8.314*298.15/log10(e)/1000=21.404 kJ/mol. The reverse reaction is -21.404 kJ/mol.

We skip a few steps.
Here are the calculated parameters for formic acid (using the same method as above):

Formic acid


SCFE: -189.772804709496 Hartree
Enthalpy correction: 23.773 kcal/mol
Entropy correction: 59.339 cal/mol
Solvation energy: 9.99 kcal/mol




Formate
SCFE:  -189.217943605798 Hartree
Enthalpy correction: 15.016 kcal/mol
Entropy correction: 56.992 cal/mol
Solvation energy: 72.47 kcal/mol

[Just for kicks we quickly look at what the prediction is:
accounting for everything (solvation, proton etc.)
((-189.217943605798*627.503+(15.016-298.15*56.992/1000)-72.47) +(-6.28-264.61))-(-189.772804709496*627.503+(23.773-298.15*59.339/1000)-9.99)=6.7498 kcal/mol
6.7498*4.184-7924.9/1000=20.316 kJ/mol <=> pKa=1000*20.316*log10(e)/(8.314*298.15)=3.56]

The isodesmic approach:

Here we look at
H3CCOOH + -OOCH -> H3CCOO- +HOOCH

This combined reaction has a
DG_solution=DG_solution(acetic acid/acetate)+DG_solution(formate/formic acid)
<=>
DG_solution(acetic acid/acetate)= DG_solution-DG_solution(formic acid/formate)
=DG_solution-(-21.404 kJ/mol)
=DG_gas+[DG_sol(acetate)+DG_sol(formic acid)-DG_sol(acetic acid)-DG_sol(formate)]+21.404 kJ/mol
=
(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))+(-189.772804709496*627.503+(23.773-298.15*(59.339/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))-(-189.217943605798*627.503+(15.016-298.15*(56.992/1000)))+(-73.23-9.99+9.32+72.47)+5.116 kcal/mol= 8.11 kcal/mol= 33.93 kJ/mol

Here we don't need to fiddle with standard states or experimental values for solvation of the proton.

pKa= DG*log(e)/(RT)=(33.93*1000)*log(e)/(8.314*298.15)=5.94
which is even worse than before...
We want ca 27 kJ/mol = 6.48 kcal/mol. Paradoxically this may be due to the ab initio approach to the pKa of formic acid actually giving a very reasonable value.

Using Propanoic acid
So let's try the isodesmic approach using propanoic acid as our reference instead.


Propanoic acid


SCFE:  -268.419515785389 Hartree
Enthalpy correction:  60.894 kcal/mol
Entropy correction:  75.184 cal/mol
Solvation energy:  8.15 kcal/mol




Propanoate
SCFE:   -267.857478200414 Hartree
Enthalpy correction: 52.096 kcal/mol
Entropy correction:  75.392 cal/mol
Solvation energy:  72.14 kcal/mol

pKa=4.86 <=> 27.739 kJ/mol= 6.63 kcal/mol

(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))+(-268.419515785389*627.503+(60.894-298.15*(75.184/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))-(-267.857478200414*627.503+(52.096-298.15*(75.392/1000)))+(-73.23-8.15+9.32+72.14)+6.63 kcal/mol=7.4324 kcal/mol=31.097

pKa= DG*log(e)/(RT)=(31.097*1000)*log(e)/(8.314*298.15)=5.45

It's a bit better, but still a bit off.

3. Conclusion:
The isodesmic approach is not magic and it relies on the similarity of two compounds, one for which there are experimental data, causing similar computational issues. Under the right conditions it's a useful approach, whereas under other conditions -- where a body of experimental data exists -- it might just be easier to determine the correlation between experimental and calculated data via fitting.

The approach worked better for the acetate/propanoate pair than the formate/acetate pair -- and one would consider acetic acid and propanoic acid to be more similar than formic acid and the higher acids. We're still far off from obtaining a perfect result though.

An additional problem is obviously the sensitivity of pKa to the DG -- one pH unit is about 1.36 kcal/mol, which is very small given the usual errors in DFT level calculations. I've seen indications online (google!) that the accuracy of b3lyp is about 3 kcal/mol, and one can always debate the accuracy of a highly empirical method like COSMO.






13 September 2012

235. CPMD with Netlib's lapack, blas and your own fftw3 on ROCKS 5.4.3/CentOS 5.6

Update 8 Feb 2013:
I somehow had forgot to include some of the instructions for the BLAS part. Fixed now.

Post:
This is done pretty much like how it's done on Debian (-march=native didn't work in the BLAS compilation though, nor was -fno-whole-file accepted when compiling cpmd)

1. Compile cmake according to this post:
http://verahill.blogspot.com.au/2012/05/compiling-openbabel-231-and-cmake-on.html

2. Compile BLAS
sudo mkdir /share/apps/tools/netlib/blas/lib -p
sudo chown $USER /share/apps/tools/netlib -R
mkdir ~/tmp
cd ~/tmp
wget http://www.netlib.org/blas/blas.tgz
tar xvf blas.tgz
cd BLAS/

Edit make.inc
OPTS     = -O3 -shared -m64 -fPIC

make all

gfortran -shared -Wl,-soname,libnetblas.so -o libblas.so.1.0.1 *.o -lc
ln -s libblas.so.1.0.1 libnetblas.so
cp lib*blas* /share/apps/tools/netlib/blas/lib

3. Compile LAPACK
sudo mkdir /share/apps/tools/netlib/lapack -p
sudo chown $USER /share/apps/tools/netlib -R
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz

tar xvf lapack-3.4.1.tgz
cd lapack-3.4.1/
mkdir build
cd build
ccmake ../

Hit 'c' and edit the values:

 BUILD_COMPLEX                    ON
 BUILD_COMPLEX16                  ON
 BUILD_DOUBLE                     ON
 BUILD_SHARED_LIBS                ON
 BUILD_SINGLE                     ON
 BUILD_STATIC_LIBS                ON
 BUILD_TESTING                    ON
 CMAKE_BUILD_TYPE                    
 CMAKE_INSTALL_PREFIX             /share/apps/tools/netlib/lapack
 LAPACKE                          OFF
 LAPACKE_WITH_TMG                 OFF
 USE_OPTIMIZED_BLAS               ON
 USE_XBLAS                        OFF

Hit 'c' again, then hit 'g'.

Edit CMakeCache.txt and add the following lines at the beginning:

########################
# EXTERNAL cache entries
########################
BLAS_FOUND:STRING=TRUE
BLAS_GENERIC_FOUND:BOOL=TRUE
BLAS_GENERIC_blas_LIBRARY:FILEPATH=/share/apps/tools/netlib/blas/lib/libnetblas.so
BLAS_LIBRARIES:PATH=/share/apps/tools/netlib/blas/lib/libnetblas.so

Do
ccmake ../
again, hit 'c', then 'g'.

Now,
make
make install

4. Compile FFTW3
sudo mkdir /share/apps/tools/fftw3
sudo chown $USER /share/apps/tools/fftw3
cd ~/tmp
wget http://www.fftw.org/fftw-3.3.1.tar.gz
tar -xvf fftw-3.3.1.tar.gz
cd fftw-3.3.1
make distclean
./configure --enable-float --enable-mpi --enable-threads --with-pic --prefix=/share/apps/tools/fftw3/single
make
make install
make distclean
./configure --disable-float --enable-mpi --enable-threads --with-pic --prefix=/share/apps/tools/fftw3/double
make 
make install

5. Compile CPMD
I downloaded the cpmd file to a client computer, then uploaded it to the ROCKS front node:
sftp me@rocks:/home/me/tmp

Connected to rocks.
Changing to: /home/me/tmp
sftp> put cpmd-v3_15_3.tar.gz
Uploading cpmd-v3_15_3.tar.gz to /home/me/tmp/cpmd-v3_15_3.tar.gz
cpmd-v3_15_3.tar.gz                100% 2937KB 587.4KB/s   00:05
sftp> exit

I then logged in via ssh as normal.
cd ~/tmp
tar xvf cpmd-v3_15_3.tar.gz
cd CPMD/CONFIGURE
Create a new file LINUX-x86_64-ROCKS

     IRAT=2
     CFLAGS='-c -O2 -Wall'
     CPP='/lib/cpp -P -C -traditional'
     CPPFLAGS='-D__Linux -D__PGI -D__GNU -DFFT_FFTW3 -DPARALLEL -DPOINTER8'
     FFLAGS='-c -O2 -fcray-pointer -fsecond-underscore'
LFLAGS='-L/share/apps/tools/fftw3/double/lib -lfftw3-lfftw3_mpi -lfftw3_threads -I/usr/include -L/share/apps/tools/netlib/blas/lib -lnetblas -L/share/apps/tools/netlib/lapack/lib -llapack -L/opt/openmpi/lib -lpthread -lmpi'
     FFLAGS_GROMOS='  $(FFLAGS)' 
      FC='mpif77 -fbounds-check'
      CC='mpicc'
      LD='mpif77 -fbounds-check'

NOTE: I don't think the -I belongs in the LFLAGS statement, but I'm presuming that I put it there for a reason back when I did it the first time.

Go to ~/tmp/CPMD, and edit wfnio.F (basically replace 3 with 2 and remove 'L'):

 15       CHARACTER(len=*) TAG
 63         IF(TAG(1:2).EQ.'NI') THEN
201       IF(TAG(1:2).NE.'NI') THEN
271         IF(TAG(1:2).EQ.'NI') THEN

Finally, edit Makefile and change
  23 LD = f95 -O
to
  23 LD = mpif77 -fbounds-check

Time to compile


./mkconfig.sh LINUX-x86_64-ROCKS > Makefile
make
sudo mkdir /share/apps/cpmd
sudo chown $USER /share/apps/cpmd
cp cpmd.x /share/apps/cpmd

echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/tools/netlib/blas/lib:/share/apps/tools/netlib/lapack/lib:/share/apps/tools/fftw3/double/lib' >>~/.bashrc
echo 'export PATH=$PATH:/share/apps/cpmd' >> ~/.bashrc
echo "export PP_LIBRARY_PATH=/share/apps/cpmd/PP_LIBRARY" >>~/.bashrc

You're now done compiling. To test, you need to get some pseudopotential files -- look at e.g. the end of http://verahill.blogspot.com.au/2012/07/not-solved-compiling-cpmd-on-debian.html for instructions.


234. CPMD with netlib lapack, blas and your own fftw on debian testing

This is a minor update to my previous post on CPMD. Back in the days I had issue linking to my Openblas libs (got a binary which would not run properly) but I've since had success with the netlib lapack and blas libs.

1. Compile the netlib lapack and blas libraries according to this post: http://verahill.blogspot.com.au/2012/09/compiling-netlibs-lapack-and-blas-on.html

2. Compile the fftw libraries according to this post (ignore the sections on Openblas and Gromacs):
http://verahill.blogspot.com.au/2012/05/gromacs-with-external-fftw3-and-blas-on.html

3. Compile CPMD. We'll be following this post in large parts.
Register with cpmd.org. Once you're approved download the cpmd source to ~/tmp.

sudo apt-get install libopenmpi-dev openmpi-bin

cd ~/tmp
tar -xvf cpmd-v3_15_3.tar.gz
cd CPMD/CONFIGURE
Create the file LINUX-x86_64-DEBIAN:
   
     IRAT=2
     CFLAGS='-c -O2 -Wall'
     CPP='/lib/cpp -P -C -traditional'
     CPPFLAGS='-D__Linux -D__PGI -D__GNU -DFFT_FFTW3 -DPARALLEL -DPOINTER8'
     FFLAGS='-c -O2 -fcray-pointer -fno-whole-file -fsecond-underscore'
     LFLAGS='-l:/opt/fftw/fftw-3.3.2/double/lib/libfftw3.a -l:/opt/fftw/fftw-3.3.2/double/lib/libfftw3_mpi.a -l:/opt/fftw/fftw-3.3.2/double/lib/libfftw3_threads.a -I/usr/include -l:/opt/netlib/blas/lib/libnetblas.so -l:/opt/netlib/lapack/lib/liblapack.so -lpthread -lmpi'
     FFLAGS_GROMOS='  $(FFLAGS)' 
      FC='mpif77 -fbounds-check'
      CC='mpicc'
      LD='mpif77 -fbounds-check'

Next edit ~/tmp/CPMD/wfnio.F and change the following lines:
 15       CHARACTER(len=*) TAG
 63         IF(TAG(1:2).EQ.'NI') THEN
201       IF(TAG(1:2).NE.'NI') THEN
271         IF(TAG(1:2).EQ.'NI') THEN

Now, in ~/tmp/CPMD, run
./mkconfig.sh LINUX-x86_64-DEBIAN > Makefile
make
sudo mkdir /opt/cpmd
sudo chown $USER /opt/cpmd
cp cpmd.x /opt/cpmd


And follow everything below 'Done! Almost.' in this post: http://verahill.blogspot.com.au/2012/07/not-solved-compiling-cpmd-on-debian.html


12 September 2012

233. Compiling netlib's lapack and blas on Debian Testing (Wheezy)

In addition to specific BLAS/LAPACK libs such as ACML, MKL, and ATLAS netlib provides (what I understand to be) reference versions of BLAS and LAPACK. Presumably these are slower than optimised versions of blas/lapack, but it doesn't hurt being familiar with them.

Here's how to compile those versions.



BLAS
sudo mkdir /opt/netlib
sudo chown $USER /opt/netlib
mkdir /opt/netlib/blas/lib -p
wget http://www.netlib.org/blas/blas.tgz
tar xvf blas.tgz
cd BLAS/

Edit make.inc
OPTS = -O3 -shared -m64 -march=native -fPIC


make all
gfortran -shared -Wl,-soname,libnetblas.so -o libblas.so.1.0.1 *.o -lc
ln -s libblas.so.1.0.1 libnetblas.so
cp lib*blas* /opt/netlib/blas/lib


To see whether everything linked ok:
 ldd libnetblas.so 
        linux-vdso.so.1 =>  (0x00007ffff1bc6000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002b42ec030000)
        libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002b42ec3b8000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002b42ec6ce000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002b42ec950000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002b42ecb67000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b42ebaf3000)




LAPACK
(inspired by this and this)
mkdir -p /opt/netlib/lapack
sudo apt-get install cmake-curses-gui
cd ~/tmp
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
tar xvf lapack-3.4.1.tgz
cd lapack-3.4.1/
mkdir build
cd build
ccmake ../

Hit 'c' to generate a configuration. Navigate with arrow keys and hit enter to change values. Change to the values in red:
 
 BUILD_COMPLEX                   *ON
 BUILD_COMPLEX16                 *ON
 BUILD_DOUBLE                    *ON
 BUILD_SHARED_LIBS               *ON
 BUILD_SINGLE                    *ON
 BUILD_STATIC_LIBS               *ON
 BUILD_TESTING                   *ON
 CMAKE_BUILD_TYPE                *     
 CMAKE_INSTALL_PREFIX            */opt/netlib/lapack
 LAPACKE                         *OFF
 LAPACKE_WITH_TMG                *OFF
 USE_OPTIMIZED_BLAS              *ON
 USE_XBLAS                       *OFF

Then hit 'c' which might give you (change the values in red) -- I got some errors about ACML/eula here, but don't worry about that.

NOTE: this will only work if you already have blas installed in a standard location. If you don't get the BLAS_FOUND etc. then you should hit 'c' again and then 'g'. Next edit your CMakeCache.txt and paste the variables (without line numbers) you find below this section, then do ccmake ../ again and make sure everything looks ok, and generate using 'g'.

 BLAS_FOUND                       TRUE
 BLAS_GENERIC_FOUND               ON
 BLAS_GENERIC_blas_LIBRARY        /opt/netlib/blas/lib/libnetblas.so
 BLAS_LIBRARIES                   /opt/netlib/blas/lib/libnetblas.so
 BLAS_LINKER_FLAGS
 BUILD_COMPLEX                   *ON
 BUILD_COMPLEX16                 *ON
 BUILD_DOUBLE                    *ON
 BUILD_SHARED_LIBS               *OFF
 BUILD_SINGLE                    *ON
 BUILD_STATIC_LIBS               *ON
 BUILD_TESTING                   *ON
 CMAKE_BUILD_TYPE                *     
 CMAKE_INSTALL_PREFIX            */usr/local 
 LAPACKE                         *OFF
 LAPACKE_WITH_TMG                *OFF
 USE_OPTIMIZED_BLAS              *ON
 USE_XBLAS                       *OFF
The hit 'c' again. If there were no issues, hit 'g' which writes the configuration and exits.


make
[100%] Building Fortran object TESTING/EIG/CMakeFiles/xeigtstz.dir/__/__/INSTALL/dsecnd_INT_ETIME.f.o
Linking Fortran executable ../../bin/xeigtstz
[100%] Built target xeigtstz

make install
Install the project...
-- Install configuration: ""
-- Installing: /opt/netlib/lapack/lib/pkgconfig/lapack.pc
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-config.cmake
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-config-version.cmake
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-targets.cmake
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-targets-noconfig.cmake
-- Installing: /opt/netlib/lapack/lib/liblapack.so
-- Removed runtime path from "/opt/netlib/lapack/lib/liblapack.so"
-- Installing: /opt/netlib/lapack/lib/libtmglib.so
-- Removed runtime path from "/opt/netlib/lapack/lib/libtmglib.so"


tree /opt/netlib/ -d
/opt/netlib/
|-- blas
|   `-- lib
`-- lapack
    `-- lib
        |-- cmake
        |   `-- lapack-3.4.1
        `-- pkgconfig

7 directories

CMakeCache.txt variables:
 16 
 17 BLAS_FOUND:STRING=TRUE
 18 
 19 //Whether not the GENERIC library was found and is usable
 20 BLAS_GENERIC_FOUND:BOOL=TRUE
 21 
 22 //Path to a library.
 23 BLAS_GENERIC_blas_LIBRARY:FILEPATH=/opt/netlib/blas/lib/libnetblas.so
 24 
 25 BLAS_LIBRARIES:PATH=/opt/netlib/blas/lib/libnetblas.so
 26 


Testing the libraries:
I built gromacs against the new libs to make sure they 'worked'

sudo mkdir /opt/gromacs
sudo chown ${USER} /opt/gromacs
cd ~/tmp
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-4.5.5.tar.gz
tar xvf gromacs-4.5.5.tar.gz
cd gromacs-4.5.5/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/netlib/blas/lib:/opt/netlib/lapack/lib
export LDFLAGS="-l:/opt/netlib/blas/lib/libnetblas.so -l:/opt/netlib/lapack/lib/liblapack.so"
./configure --disable-mpi --enable-float --with-external-blas --with-external-lapack --program-suffix=_netlib --prefix=/opt/gromacs/gromacs-4.5.5
make
make install

Check that it linked ok:
ldd /opt/gromacs/gromacs-4.5.5/bin/grompp_netlib
        linux-vdso.so.1 =>  (0x00007fffb83f2000)
        libgmxpreprocess.so.6 => /opt/gromacs/gromacs-4.5.5/lib/libgmxpreprocess.so.6 (0x00002b6411cfa000)
        libmd.so.6 => /opt/gromacs/gromacs-4.5.5/lib/libmd.so.6 (0x00002b6411fcd000)
        libfftw3f.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3f.so.3 (0x00002b64123ad000)
        libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00002b64127b0000)
        libgmx.so.6 => /opt/gromacs/gromacs-4.5.5/lib/libgmx.so.6 (0x00002b6412b10000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00002b6412fe5000)
        libnetblas.so => /opt/netlib/blas/lib/libnetblas.so (0x00002b64131e9000)
        liblapack.so => /opt/netlib/lapack/lib/liblapack.so (0x00002b64134cc000)
        libnsl.so.1 => /lib/x86_64-linux-gnu/libnsl.so.1 (0x00002b6413ece000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002b64140e6000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00002b6414369000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002b6414585000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00002b641490c000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00002b6414b24000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b6411ad8000)
        libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002b6414d47000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002b641505d000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002b6415274000)

Here are some input files (it's not a 'real' md run -- I just needed something small and quick to run):
step1.top:
#include "/opt/gromacs/gromacs-4.5.5/share/gromacs/top/ffoplsaa.itp"
#include "/opt/gromacs/gromacs-4.5.5/share/gromacs/top/oplsaa.ff/tip4p.itp"

[system]
test 

[molecules]

step1.mdp:

integrator = md
define      = -DFLEXIBLE
emtol      = 1000.0
emstep     = 0.001
nsteps     = 5000
nstlist    = 1
ns_type    = grid 
rlist      = 0.9
coulombtype= PME  
rcoulomb   = 0.9  
rvdw       = 1.0  
pbc        =  xyz

genbox_netlib -o step1.gro -cs /opt/gromacs/gromacs-4.5.5/share/gromacs/top/tip4p.gro -box 4x4x4 -p step1.top

grompp_netlib -f step1.mdp -po step2.mdp -p step1.top -pp step2.top -c step1.gro -o step2.tpr

mdrun_netlib -v -s step2.tpr -o step3.trr -x step3.xtc -cpo step3.cpt -c step3.gro -e step3.edr -g step3.log

On my old AMD II X3 I got about 7.7 GFLOPS with Openblas and 7.8 GFLOPS with the above libs. Note that the run is shorter than a minute so it's pretty useless for benchmarking. However, there's no obvious MAJOR penalty.


If you don't have cmake:
cp INSTALL/make.inc.gfortran make.inc

Edit make.inc
 15 FORTRAN  = gfortran
 16 OPTS     = -O2 -fPIC -m64
 17 DRVOPTS  = $(OPTS)
 18 NOOPT    = -O0 -fPIC -m64
 19 LOADER   = gfortran
 20 LOADOPTS =
Edit Makefile
 11 #lib: lapacklib tmglib
 12 lib: blaslib variants lapacklib tmglib
Run make
make
-->  Tests passed: 13176


   -->   LAPACK TESTING SUMMARY  <--
  Processing LAPACK Testing output found in the TESTING direcory
SUMMARY              nb test run  numerical error    other error  
================    =========== ================= ================  
REAL              1077227  0 (0.000%) 0 (0.000%) 
DOUBLE PRECISION 1078039  0 (0.000%) 0 (0.000%) 
COMPLEX           522814  0 (0.000%) 0 (0.000%) 
COMPLEX16          552410  0 (0.000%) 0 (0.000%) 

--> ALL PRECISIONS 3230490  0 (0.000%) 0 (0.000%) 




Older version:
In the oldest version of this post I did the blas compilation by hand:

gfortran -O2 -fPIC -m64 -march=native -funroll-all-loops -c *.f

To build a static library:
ar rvs libblas.a *.o

To build a shared/dynamic library:
gfortran -shared -Wl,-soname,libnetblas.so -o libblas.so.1.0.1 *.o -lc 


ldd libblas.so.1.0.1
        linux-vdso.so.1 =>  (0x00007fff301af000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002aeeac390000)
        libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002aeeac718000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002aeeaca2e000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002aeeaccb0000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002aeeacec7000)
        /lib64/ld-linux-x86-64.so.2 (0x00002aeeabedd000)

Either way:
cp libblas* /opt/netlib/blas/lib

To test:
wget http://www.netlib.org/blas/sblat1
mv sblat1 sblat1.f

And EITHER
gfortran sblat1.f -l:libblas.a

OR
ln -s libblas.so.1.0.1 libnetblas.so
gfortran sblat1.f -l:libnetblas.so

THEN
./a.out
 
Real BLAS Test Program Results
Test of subprogram number  1             SDOT 
                                    ----- PASS -----

 Test of subprogram number  2            SAXPY 
                                    ----- PASS -----

 Test of subprogram number  3            SROTG 
                                    ----- PASS -----

 Test of subprogram number  4             SROT 
                                    ----- PASS -----

 Test of subprogram number  5            SCOPY 
                                    ----- PASS -----

 Test of subprogram number  6            SSWAP 
                                    ----- PASS -----

 Test of subprogram number  7            SNRM2 
                                    ----- PASS -----

 Test of subprogram number  8            SASUM 
                                    ----- PASS -----

 Test of subprogram number  9            SSCAL 
                                    ----- PASS -----

 Test of subprogram number 10            ISAMAX
                                    ----- PASS -----