Lindqvist -- a blog about Linux and Science. Mostly.

19 September 2012

241. pKa, part 3: ccCA in NWChem. Doing something wrong?

First of all, I'm having problems reproducing the output from 'task ccca' by following the methods described in J. Chem. Theory Comput 2008, 4, 328-334 (scaling 0.9854) or Mol. Phys. 2009,107(8-12),1107-1121. The discrepancies are the energies reported for the MP2/cc-pVTZ-DK and CCSD(T)/cc-PVTZ which leads to a difference in calculated relativistic and correlation corrections. More about that at some other time.

Here's using ccCA in NWChem on acetic acid/acetate and formic acid/formate.
More about how it works in another post.

Basically, the way I am using it the results are very, very poor with ccCA. All I can think is that I must be doing something wrong.

INPUT files

Acetic acid input:

Title "aceticacid"
Start  aceticacid

echo

charge 0

geometry autosym units angstrom
 C     -0.312051     -1.36877     0.00000
 H     -0.929226     -1.55822     -0.878253
 H     -0.929226     -1.55822     0.878253
 H     0.548700     -2.02934     0.00000
 C     0.150590     0.0606620     0.00000
 O     -0.897092     0.922315     0.00000
 H     -0.521850     1.81528     0.00000
 O     1.29371     0.435169     0.00000
end

basis
* library "cc-pvtz"
end

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
  default
end

ccca
  optimize
end

task ccca

Acetate input:

Title "acetate_ccca"
Start  acetate_ccca
echo
charge -1

geometry autosym units angstrom
 C     -0.0311237     -1.36218     0.00000
 H     0.501926     -1.73727     0.878691
 H     0.501926     -1.73727     -0.878691
 H     -1.05131     -1.75101     0.00000
 C     -0.00500996     0.204086     0.00000
 O     1.14247     0.706045     0.00000
 O     -1.12049     0.771493     0.00000
end

basis
* library "cc-pvtz"
end

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
  default
end

ccca
   optimize
end

task ccca

Formic acid input:

Title "formicacid"
Start  formicacid

echo

charge 0

geometry autosym units angstrom
 C     0.410955     -0.132154     0.00000
 H     1.50430     -0.0475164     0.00000
 O     -0.134104     1.09718     0.00000
 H     -1.09846     0.988665     0.00000
 O     -0.203188     -1.15938     0.00000
end

basic
* library "cc-pvtz"
end

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
end

ccca
   optimize
end

task ccca

Formate input:

Title "formate"
Start  formate

echo

charge -1

geometry autosym units angstrom
 C     0.00000     0.00000     0.329396
 H     0.00000     0.00000     1.47310
 O     -1.13532     0.00000     -0.189103
 O     1.13532     0.00000     -0.189103
end

basis "ao basis" spherical print
* library "cc-pvtz"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
end

ccca
   optimize
end

task ccca

OUTPUT

Acetic acid

 Temperature                      =   298.15K
 frequency scaling parameter      =   0.9889

 Zero-Point correction to Energy  =   38.155 kcal/mol  (  0.060805 au)
 Thermal correction to Energy     =   41.060 kcal/mol  (  0.065434 au)
 Thermal correction to Enthalpy   =   41.653 kcal/mol  (  0.066378 au)

 Total Entropy                    =   69.467 cal/mol-K
   - Translational                =   38.179 cal/mol-K (mol. weight =  60.0211)    - Rotational                   =   23.830 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    7.458 cal/mol-K

 Cv (constant volume heat capacity) =   14.439 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    8.480 cal/mol-K


 ccCA: calculations done, now printing results
 
 ccCA-P  reference energy =   -228.82035086993045     
 ccCA-S3 reference energy =   -228.82800135561300     
 ccCA-S4 reference energy =   -228.82030423449530     
 ccCA-PS3 reference energy=   -228.82417611277174     
 DK correction            =  -0.13322049012506909     
 CCSD(T) correction       =   -4.8762979862686961E-002
 CV correction            =  -0.20936881324035994     
 ---------------------------
 Total ccCA-P   energy    =   -229.21170315315857     
 Total ccCA-S3  energy    =   -229.21935363884111     
 Total ccCA-S4  energy    =   -229.21165651772341     
 Total ccCA-PS3 energy    =   -229.21552839599985     
 
 Thermochemistry available:
            ZPE   =   6.0851792771826778E-002
 ccCA-P   E+ZPE   =  -229.15085136038675     
 ccCA-S3  E+ZPE   =  -229.15850184606930     
 ccCA-S4  E+ZPE   =  -229.15080472495160     
 ccCA-PS3 E+ZPE   =  -229.15467660322804     
 Wrote ccCA-P    energy to the RTDB 
 Leaving ccCA module...

 Task  times  cpu:     5565.6s     wall:     5650.7s

Acetate

 Temperature                      =   298.15K
 frequency scaling parameter      =   0.9889

 Zero-Point correction to Energy  =   29.591 kcal/mol  (  0.047157 au)
 Thermal correction to Energy     =   31.853 kcal/mol  (  0.050762 au)
 Thermal correction to Enthalpy   =   32.446 kcal/mol  (  0.051706 au)

 Total Entropy                    =   64.067 cal/mol-K
   - Translational                =   38.129 cal/mol-K (mol. weight =  59.0133)
   - Rotational                   =   23.739 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    2.199 cal/mol-K

 Cv (constant volume heat capacity) =   11.235 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    5.276 cal/mol-K

 ccCA: calculations done, now printing results
 
 ccCA-P  reference energy =   -228.25857936176124     
 ccCA-S3 reference energy =   -228.26625083689740     
 ccCA-S4 reference energy =   -228.25849407721080     
 ccCA-PS3 reference energy=   -228.26241509932930     
 DK correction            =  -0.13318127658752132     
 CCSD(T) correction       =   -4.4728554700242285E-002
 CV correction            =  -0.20921905251765338     
 ---------------------------
 Total ccCA-P   energy    =   -228.64570824556665     
 Total ccCA-S3  energy    =   -228.65337972070282     
 Total ccCA-S4  energy    =   -228.64562296101622     
 Total ccCA-PS3 energy    =   -228.64954398313472     
 
 Thermochemistry available:
            ZPE   =   4.7193435242008613E-002
 ccCA-P   E+ZPE   =  -228.59851481032464     
 ccCA-S3  E+ZPE   =  -228.60618628546081     
 ccCA-S4  E+ZPE   =  -228.59842952577421     
 ccCA-PS3 E+ZPE   =  -228.60235054789271     
 Wrote ccCA-P    energy to the RTDB 
 Leaving ccCA module...

 Task  times  cpu:     3859.1s     wall:     3910.2s

Formic acid

 Temperature                      =   298.15K
 frequency scaling parameter      =   0.9889

 Zero-Point correction to Energy  =   20.909 kcal/mol  (  0.033320 au)
 Thermal correction to Energy     =   22.902 kcal/mol  (  0.036497 au)
 Thermal correction to Enthalpy   =   23.495 kcal/mol  (  0.037441 au)

 Total Entropy                    =   59.329 cal/mol-K
   - Translational                =   37.387 cal/mol-K (mol. weight =  46.0055)    - Rotational                   =   21.008 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    0.934 cal/mol-K

 Cv (constant volume heat capacity) =    8.703 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    2.744 cal/mol-K

 ccCA: calculations done, now printing results

 ccCA-P  reference energy =   -189.56748775122853
 ccCA-S3 reference energy =   -189.57364633780318
 ccCA-S4 reference energy =   -189.56733835209894
 ccCA-PS3 reference energy=   -189.57056704451585
 DK correction            =  -0.11856238070660652
 CCSD(T) correction       =   -3.0831132609506540E-002
 CV correction            =  -0.16057296161548607
 ---------------------------
 Total ccCA-P   energy    =   -189.87745422616013
 Total ccCA-S3  energy    =   -189.88361281273478
 Total ccCA-S4  energy    =   -189.87730482703054
 Total ccCA-PS3 energy    =   -189.88053351944745

 Thermochemistry available:
            ZPE   =   3.3346398704552728E-002
 ccCA-P   E+ZPE   =  -189.84410782745556
 ccCA-S3  E+ZPE   =  -189.85026641403022
 ccCA-S4  E+ZPE   =  -189.84395842832598
 ccCA-PS3 E+ZPE   =  -189.84718712074289
 Wrote ccCA-P    energy to the RTDB
 Leaving ccCA module...

 Task  times  cpu:     1369.3s     wall:     1407.5s

Formate

 Temperature                      =   298.15K
 frequency scaling parameter      =   0.9889

 Zero-Point correction to Energy  =   12.385 kcal/mol  (  0.019737 au)
 Thermal correction to Energy     =   14.252 kcal/mol  (  0.022713 au)
 Thermal correction to Enthalpy   =   14.845 kcal/mol  (  0.023656 au)

 Total Entropy                    =   56.927 cal/mol-K
   - Translational                =   37.321 cal/mol-K (mol. weight =  44.9976)
   - Rotational                   =   19.229 cal/mol-K (symmetry #  =        2)
   - Vibrational                  =    0.377 cal/mol-K

 Cv (constant volume heat capacity) =    7.310 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    1.351 cal/mol-K


 ccCA: calculations done, now printing results
 
 ccCA-P  reference energy =   -189.01189831122844     
 ccCA-S3 reference energy =   -189.01808726799254     
 ccCA-S4 reference energy =   -189.01171261173857     
 ccCA-PS3 reference energy=   -189.01499278961049     
 DK correction            =  -0.11851294005154500     
 CCSD(T) correction       =   -2.6430727035545942E-002
 CV correction            =  -0.16057463040127118     
 ---------------------------
 Total ccCA-P   energy    =   -189.31741660871680     
 Total ccCA-S3  energy    =   -189.32360556548090     
 Total ccCA-S4  energy    =   -189.31723090922694     
 Total ccCA-PS3 energy    =   -189.32051108709885     
 
 Thermochemistry available:
            ZPE   =   1.9751889903778755E-002
 ccCA-P   E+ZPE   =  -189.29766471881302

 ccCA-S3  E+ZPE   =  -189.30385367557713     
 ccCA-S4  E+ZPE   =  -189.29747901932316     
 ccCA-PS3 E+ZPE   =  -189.30075919719508     
 Wrote ccCA-P    energy to the RTDB 
 Leaving ccCA module...

Solvation energy

Solvation energy may seem easy to calculate, but difficult to calculate accurately using implicit methods, in particular for ions. I used the optimized structures from above, and then did a single-point COSMO (rsolv 0. Not ideal) at RDFT(b3lyp)/cc-pVTZ/

Acetic acid: -8.59 kcal/mol
Acetate: -72.33 kcal/mol
Formic acid: -8.90
Formate: -72.59 kcal/mol
H+: -624.61 kcal/mol (lit. value)

Free energies:
G: ccCA-P+(Hcorr-T*Scorr)
G(acetic acid): -229.21170315315857*627.503+(41.653-298.15*69.467/1000)-8.59
G(acetate): -228.64570824556665*627.503+(32.446-298.15*64.067/1000)-72.33
G(formic acid): -189.87745422616013*627.503+(23.495-298.15*59.329/1000)-8.90
G(formate): -189.31741660871680*627.503+(14.845-298.15*56.927/1000)-72.59
G(H+): -6.28 kcal/mol (lit. value) -264.61 (lit. value)=-270.89 kcal/mol

Results: Direct approach:
Don't forget to account for the standard state. R=1.9858775(34)×10⁻³ kcal/(K.mol)
DG(acetic/acetate)=G(acetate)+G(H+)-G(acetic)+RT*ln(1/24.46)=
(-228.64570824556665*627.503+(32.446-298.15*64.067/1000)-72.33-270.89)-(-229.21170315315857*627.503+(41.653-298.15*69.467/1000)-8.59)+1.9858775*298.15*log(1/24.46)/1000=
11.0435795929699 kcal/mol=46.2063370169861 kJ/mol =>
pKa=DG*log10(e)/RT=
11.0435795929699*log10(e)/(1.9858775*10**(-3)*298.15)
=8.1 (which is very bad -- it should be close to 4.75)

DG(formic/formate)=G(formate)+G(H+)-G(formic)-RT*ln(1/24.46)=
(-189.31741660871680*627.503+(14.845-298.15*56.927/1000)-72.59-270.89)-(-189.87745422616013*627.503+(23.495-298.15*59.329/1000)-8.90)+1.9858775*298.15*log(1/24.46)/1000=
7.01850845285312 kcal/mol=29.3654393667375 kJ/mol
pKa=5.1 (which is quite bad -- it should be close to 3.75)

Results: Isodesmic approach
From an older post:
"Assuming that we know that formic acid has a pKa of 3.75, then DG_solution=pKa*RT/log(e)=3.75*8.314*298.15/log10(e)/1000=21.404 kJ/mol. The reverse reaction is -21.404 kJ/mol."
That's about 5.116 kcal/mol

DG(acetate)+DG(formic)-(DG(acetic)+DG(formate))+DG(ref)=
((-228.64570824556665*627.503+(32.446-298.15*64.067/1000)-72.33)+(-189.87745422616013*627.503+(23.495-298.15*59.329/1000)-8.90))-((-229.21170315315857*627.503+(41.653-298.15*69.467/1000)-8.59)+(-189.31741660871680*627.503+(14.845-298.15*56.927/1000)-72.59))+5.116=
4.02507114014588 kcal/mol+5.116 kcal/mol =9.14107114014588 kcal/mol <=> pKa= 6.7
Which is better, but still not as good as here.

Using the E+zpe energies doesn't help much:
((-228.59851481032464*627.503+(32.446-298.15*64.067/1000)-72.33)+(-189.84410782745556*627.503+(23.495-298.15*59.329/1000)-8.90))-((-229.15085136038675*627.503+(41.653-298.15*69.467/1000)-8.59)+(-189.29766471881302*627.503+(14.845-298.15*56.927/1000)-72.59))+5.116=
9.10100587110175 kcal/mol <=> pKa=6.68

I really have no idea why the results are so bad when I had reasonable results with DFT/b3lyp/6-31++G** which should be worse than the E(CBS)+E(CC)+E(CV)+E(DK) approach for calculating electronic energies.

Solvation energies are a bit different and could explain some of the difference. Using the solvation energies from here I got:
((-228.64570824556665*627.503+(32.446-298.15*64.067/1000)-73.23)+(-189.87745422616013*627.503+(23.495-298.15*59.329/1000)-9.99))-((-229.21170315315857*627.503+(41.653-298.15*69.467/1000)-9.32)+(-189.31741660871680*627.503+(14.845-298.15*56.927/1000)-72.47))+5.116=
7.76107114008302 kcal/mol <=> pKa=5.69. Not 4.75, but closer.

Using rsolv 1.3 I get

((-228.64570824556665*627.503+(32.446-298.15*64.067/1000)-71.09)+(-189.87745422616013*627.503+(23.495-298.15*59.329/1000)-6.37))-((-229.21170315315857*627.503+(41.653-298.15*69.467/1000)-6.53)+(-189.31741660871680*627.503+(14.845-298.15*56.927/1000)-71.90))+5.116=
10.1610711401063 kcal/mol which is bad.

More thinking.
This paper says that the gas phase free energy for the deprotonation of acetic acid should be 341.1 kcal/mol
(-228.64570824556665*627.503+(32.446-298.15*64.067/1000)-6.82)-(-229.21170315315857*627.503+(41.653-298.15*69.467/1000))=340.75 kca/mol
We're within 1 kcal/mol

The same paper states 338.5 kcal/mol for formic acid:
(-189.31741660871680*627.503+(14.845-298.15*56.927/1000)-6.82)-(-189.87745422616013*627.503+(23.495-298.15*59.329/1000))=336.67 kca/mol

For our direct solution phase pKa calculation formic acid was off by about 1.9 kcal/mol which is similar to the error here.

18 September 2012

240. Harmonic frequency scaling in NWChem

...is difficult to find information about, but apparently it IS possible:
http://www.emsl.pnl.gov/docs/nwchem/nwchem-support/2012/08/0021._NWCHEM_FW:_BOUNCE_nwchem-users_emsl.pnl

Example input:

Title "acetate"

Start  acetate

echo

charge -1

geometry autosym units angstrom
 C     0.0191498     0.0215213     0.0191498
 H     -0.621688     -0.620669     0.658428
 H     0.658428     -0.620669     -0.621688
 H     -0.628475     0.649316     -0.628475
 C     0.881900     0.883055     0.881900
 O     1.74905     0.315740     1.74905
 O     0.799516     2.22959     0.799516
end

basis
* library "cc-pvtz"
end

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
end

set vib:scalefreq  0.9854

task dft optimize
task dft freq

which gives:

 Temperature                      =   298.15K
 frequency scaling parameter      =   0.9854

 Zero-Point correction to Energy  =   29.486 kcal/mol  (  0.046990 au)
 Thermal correction to Energy     =   31.752 kcal/mol  (  0.050601 au)
 Thermal correction to Enthalpy   =   32.345 kcal/mol  (  0.051544 au)

 Total Entropy                    =   64.086 cal/mol-K
   - Translational                =   38.129 cal/mol-K (mol. weight =  59.0133)
   - Rotational                   =   23.739 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    2.218 cal/mol-K

 Cv (constant volume heat capacity) =   11.268 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    5.309 cal/mol-K

c.f.

 Temperature                      =   298.15K
 frequency scaling parameter      =   1.0000

 Zero-Point correction to Energy  =   29.923 kcal/mol  (  0.047686 au)
 Thermal correction to Energy     =   32.173 kcal/mol  (  0.051272 au)
 Thermal correction to Enthalpy   =   32.766 kcal/mol  (  0.052215 au)

 Total Entropy                    =   64.008 cal/mol-K
   - Translational                =   38.129 cal/mol-K (mol. weight =  59.0133)
   - Rotational                   =   23.739 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    2.141 cal/mol-K

 Cv (constant volume heat capacity) =   11.132 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    5.173 cal/mol-K

239. Sun GridEngine: resetting queue status on node

I occasionally run into problems with space during a run on my cluster, which causes the job to fail and the node to be marked as unavailable:

qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
eight.q@neon                   BIP   0/0/8          0.45     lx26-amd64    
---------------------------------------------------------------------------------
five.q@boron                   BIP   0/0/5          6.01     lx26-amd64    
---------------------------------------------------------------------------------
six.q@boron                    BIP   0/6/6          6.01     lx26-amd64    
    788 0.75000 submit__la user         r     09/07/2012 18:36:56     6        
---------------------------------------------------------------------------------
two.q@beryllium                BIP   0/0/2          0.24     lx26-amd64    
---------------------------------------------------------------------------------
four.q@tantalum                BIP   0/0/4          0.05     lx26-amd64    E
---------------------------------------------------------------------------------
three.q@beryllium              BIP   0/0/3          0.24     lx26-amd64    
---------------------------------------------------------------------------------
main.q@beryllium               BIP   0/0/1          0.24     lx26-amd64    
---------------------------------------------------------------------------------
main.q@boron                   BIP   0/0/1          6.01     lx26-amd64    
---------------------------------------------------------------------------------
main.q@tantalum                BIP   0/0/1          0.05     lx26-amd64    

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    789 0.67310 zoli.qsub  user         qw    09/09/2012 10:00:35     6        
    802 0.60527 submit__bi user         qw    09/10/2012 20:45:24     6        
    803 0.60525 submit__bi user         qw    09/10/2012 20:46:00     6        
    927 0.25071 submit__ac user         qw    09/18/2012 08:24:00     4        
    928 0.25000 submit__ac user         qw    09/18/2012 08:45:52     4

Before you do anything else, free up space and consider moving your scratch dir to a different/separate disk.

Since I keep forgetting how to reset it, here it is -- as a SGE admin do:
/usr/bin/qmod -c four.q@tantalum

me@beryllium changed state of "four.q@tantalum" (no error)

And now everything is good:

qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
eight.q@neon                   BIP   0/0/8          0.25     lx26-amd64    
---------------------------------------------------------------------------------
five.q@boron                   BIP   0/0/5          5.91     lx26-amd64    
---------------------------------------------------------------------------------
six.q@boron                    BIP   0/6/6          5.91     lx26-amd64    
    788 0.75000 submit__la user         r     09/07/2012 18:36:56     6        
---------------------------------------------------------------------------------
two.q@beryllium                BIP   0/0/2          0.44     lx26-amd64    
---------------------------------------------------------------------------------
four.q@tantalum                BIP   0/4/4          0.17     lx26-amd64    
    927 0.25071 submit__ac user         r     09/18/2012 11:01:26     4        
---------------------------------------------------------------------------------
three.q@beryllium              BIP   0/0/3          0.44     lx26-amd64    
---------------------------------------------------------------------------------
main.q@beryllium               BIP   0/0/1          0.44     lx26-amd64    
---------------------------------------------------------------------------------
main.q@boron                   BIP   0/0/1          5.91     lx26-amd64    
---------------------------------------------------------------------------------
main.q@tantalum                BIP   0/0/1          0.17     lx26-amd64    

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    789 0.67310 zoli.qsub  user         qw    09/09/2012 10:00:35     6        
    802 0.60527 submit__bi user         qw    09/10/2012 20:45:24     6        
    803 0.60525 submit__bi user         qw    09/10/2012 20:46:00     6        
    928 0.25000 submit__ac user         qw    09/18/2012 08:45:52     4

17 September 2012

238. Calculating pKa, part 2: CBS extrapolation basics

I'm typing this up as I'm learning. Be prepared that there may be outright errors, fuzzy thinking and misunderstood concepts in this post. Use it as an inspiration for learning about Complete Basis Set extrapolation -- not as an authoritative guide. What follows is just what I think I've understood. It's a short post since I'll just put it here in order that I can refer to it later.

The general idea
The basic idea behind Complete Basis Set extrapolation seems to be that as you make your basis sets larger and larger by adding Zeta functions to your valence orbitals (making your valence orbitals more 'flexible') the energies you get start to approach those you would get with an infinitely large or complete (and thus 'true') basis set. Exactly how quickly this approach (exponential?) occurs is a matter for debate, so there's a number of ways of extrapolating it.

In practical terms, what seems to be done is:
1. The structure of a molecule is first optimised using a specific level of theory (e.g. MP2/cc-aug-pvdz) in the gas phase. This structure is used for all subsequent calculations.
2. A single point calculation at MP2/aug-cc-pVDZ is done. The electronic energy is recorded.
3. A single point calculation at MP2/aug-cc-pVTZ is done. The electronic energy is recorded.
4. A single point calculation at MP2/aug-cc-pVQZ is done. The electronic energy is recorded.
5. A single point calculation at MP2/aug-cc-pV5Z is done. The electronic energy is recorded.
etc.

where D=double(2), T=triple (3), Q=Quadruple (4)

Using MP2 and the Dunning-Knoll series of correlation consistent basis sets, for formate I got
2 -188.772348824917 (MP2/aug-cc-pVDZ)
3 -188.930859478806 (MP2/aug-cc-pVTZ)
4 -188.984671946038 (MP2/aug-cc-pVQZ)

Which looks like this:

Not everyone will appreciate the default choice of green and red by gnuplot...

The fitted line is MP2energy=1*exp(-0.599028936687644*Zetafunctions)-189.082170648532

Plotting an exponential with three unknown based only on three points is obviously poor form, but the point is fairly clear.

The extrapolated energy is -189.082170648532 hartree. For practical use you will need to obtain enthalpy/entropy corrections and solvation energies using a set method. Also, there are normally corrections for unpaired electrons etc.

The Peterson Scheme:
(Peterson, K. A.; Woon, D. E.; Dunning, T. H., Jr. J. Chem. Phys. 1994, 100, 7410, link)

Instead of a straight A*exp(-B*x)+C fit you do E(x)=A+B*exp(-(x-1))+C*exp(-(x-1)**2), where A is the CBS energy.

So, using gnuplot we can put our energies in 'cbs.dat':

2 -188.772348824917
3 -188.930859478806
4 -188.984671946038

and fit it in gnuplot using:

set xrange [0:5]
f(x)=A+B*exp(-(x-1))+C*exp(-(x-1)**2)
fit f(x) 'cbs.dat' u 1:2 via A,B,C
plot 'cbs.dat' u 1:2, f(x) lc 3

which gives us a CBS energy of -189.016 Hartree (c.f. -189.082 using a simple exponential).

The basic underlying principle behind CBS is thus pretty clear even to someone who's not skilled in the art of computational chemistry.

The difficulties seem to be:
1. What basis set family to use (cc-pVXZ ?)
2. Whether to use diffuse/polarisation/extra orbital functions (aug, p, d+)?
3. What level to do solvation calc on (DFT, MP2; COSMO, CPCM)?
4. What level to do structural optimisation at?
5. What level to do frequency calculation at?

Remember that a great many basis sets haven't been parametrised for elements beyond Ar.

Also, as far as solvation calculations are concerned an issue seems to be that the correct way to calculate solvation energy seems to be using gas phase structures -- and NOT structures optimised using a solvation model. This can cause issues if the gas phase and solution phase conformations are considerably different.

16 September 2012

237. Briefly: Packet corrupt during ssh sessions

Whenever I'm using two subnets for my cluster I seems to be having problems with:

Corrupted MAC on input.
Disconnecting: Packet corrupt

It particularly happens when there's a lot of information being passed to the screen. It's a right killer when you're compiling on a remote system. However, while I've been able to get around that by running a GNU Screen session on the remote box it was time to solve it.\

I googled and found:
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/60764

The subnet I'm having issues with is operation across a switch. One of the computers on the network is defined as the gateway and I tend to have problems when connecting from it to the other computers on the network.
The gateway server has two interfaces, eth0 which is connected to a router which is connected to the outside world, and eth1 which is connected to the local subnet I'm having problems with.

The fix has been as as simple as
sudo apt-get install ethtool
sudo ethtool -K eth1 rx off tx off

I only had to run this on the gateway box and so far I've had no issues. Depending on how you're managing your network interfaces (i.e. wicd, network-manager, /etc/networking/*) you may want to add it to the post-up section of your /etc/network/interfaces:
post-up ethtool -K eth1 rx off tx off

Links to this post:
http://winscp.net/forum/viewtopic.php?t=12469

14 September 2012

236. Calculating pKa, part 1:Example (attempt) of an isodesmic reactions in NWChem

Back to learning about computational approaches to chemistry. The usual warnings apply: why would you trust anything that I say about anything? I'm writing anonymously, and I may misunderstand things at times. So make sure that you compare what I write with that of other sources and make up your own mind.

Anyway, I found a fairly detailed presentation in which they were using Gaussian 98 here: https://www.uow.edu.au/~adamt/Trevitt_Research/Links_files/pKa%20workshop%20slides.pdf

While that should be ok to reproduce, it's not that straightforward to do even with Gaussian, since G09 and G03 don't report solvation parameters in the same way (or detail) as G98.

I'm also a lot keener on NWChem than Gaussian for various reasons, not least that it's 'free' (both libre and gratis) while Gaussian inc. has been accused of doing somewhat unfriendly things in the name of protecting their business interests.

See here and here for an example, and then here for a rebuttal from Gaussian inc. I know that EMSL/Pacific Northwest National Lab that develop NWChem and ECCE are prohibited from using Gaussian since they are considered as being competitors.

Back to science.

Our test example will be acetate, and we'll use formic acid to correct our results.
The fact that this post is very long is due to the amount of detail supplied -- I prefer to show some of the more obvious things so that people can learn from what I post -- and I learn by writing the post.

But first let's just do everything using direct methods.

We work with a thermochemical cycle:

IF we can't calculate the DG_solution directly (i.e. too expensive) we can optimise our structures in the gas phase, and then calculate the solvation energy for those structures.

Then DG_sol=DG_gas+DG_solvation(B)-DGsolvation(A).
(more generally sum of DG_solv(prod) - sum of DG_solv(reactants)).

1. pKa of Acetic acid using direct methods

We can either do
H3CCOOH -> H3COO- + H+
or
H3CCOOH + H2O -> H3COO- + H3O+

Steps:
Optimise acetic acid and acetate in the gas phase and do frequency calculations to get the enthalpy and entropy. Then use the gas phase structures and do single point calculations using COSMO to get the electrostatic solvation energies. Finally, use standard state corrections.

A. Optimise acetic acid in the gas phase and do frequency calculation

Title "aceticacid_gas"
Start  aceticacid_gas
echo
charge 0

geometry autosym units angstrom
 C     0.0402340     0.0308110     0.0402340
 H     -0.600803     -0.611482     0.679201
 H     0.679201     -0.611482     -0.600803
 H     -0.607055     0.659335     -0.607055
 C     0.903928     0.890481     0.903928
 O     0.814831     2.23989     0.814831
 O     1.78275     0.299052     1.78275
 H     2.25438     1.03276     2.25438
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
  default
end

task dft optimize
task dft freq

which gives

         Total DFT energy =     -229.102415550663
      One electron energy =     -550.833155571059
           Coulomb energy =      230.555870626149
    Exchange-Corr. energy =      -29.497734561879
 Nuclear repulsion energy =      120.672603956126

 Numeric. integr. density =       31.999999816803

     Total iterative time =      3.6s

and

 Temperature                      =   298.15K
 frequency scaling parameter      =   1.0000

 Zero-Point correction to Energy  =   38.683 kcal/mol  (  0.061646 au)
 Thermal correction to Energy     =   41.568 kcal/mol  (  0.066243 au)
 Thermal correction to Enthalpy   =   42.160 kcal/mol  (  0.067186 au)

 Total Entropy                    =   69.198 cal/mol-K
   - Translational                =   38.179 cal/mol-K (mol. weight =  60.0211)
   - Rotational                   =   23.855 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    7.164 cal/mol-K

 Cv (constant volume heat capacity) =   14.327 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    8.368 cal/mol-K

So that G=-229.102415550663*(627.503 kcal/Hartree)+(42.160 kcal/mol-298.15*(69.198 cal/molK)/1000)=-1.4372e+05 kcal/mol

B. Optimise acetate in the gas phase and do frequency calculation

Title "acetate_gas"

Start  acetate_gas

echo

charge -1

geometry autosym units angstrom
 C     0.0405721     0.0285481     0.0405721
 H     -0.601438     -0.613690     0.678620
 H     0.678620     -0.613690     -0.601438
 H     -0.605857     0.658809     -0.605857
 C     0.904975     0.886806     0.904975
 O     0.825186     2.23364     0.825186
 O     1.77103     0.316179     1.77103
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken 
end

driver
  default
end

task dft optimize
task dft freq

which gives

         Total DFT energy =     -228.540046314754
      One electron energy =     -539.474204198295
           Coulomb energy =      229.063209931484
    Exchange-Corr. energy =      -29.339040486703
 Nuclear repulsion energy =      111.209988438759

 Numeric. integr. density =       32.000000595562

     Total iterative time =      2.9s

and

 Temperature                      =   298.15K
 frequency scaling parameter      =   1.0000

 Zero-Point correction to Energy  =   30.023 kcal/mol  (  0.047845 au)
 Thermal correction to Energy     =   32.271 kcal/mol  (  0.051427 au)
 Thermal correction to Enthalpy   =   32.863 kcal/mol  (  0.052371 au)

 Total Entropy                    =   64.022 cal/mol-K
   - Translational                =   38.129 cal/mol-K (mol. weight =  59.0133)
   - Rotational                   =   23.766 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    2.127 cal/mol-K

 Cv (constant volume heat capacity) =   11.112 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    5.153 cal/mol-K

so that G=-228.540046314754*(627.503 kcal/Hartree)+(32.271 kcal/mol-298.15*(64.022 cal/molK)/1000)=-1.4338e+05 kcal/mol

Putting A and B together: (-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))=344.54 kcal/mol

We haven't accounted for solvation or the proton yet.

C. Solvation of acetic acid

Title "aceticacid_solvation"
Start  aceticacid_solvation

echo

charge 0

geometry autosym units angstrom
 C     -0.313400     -1.37257     0.00000
 H     -0.932151     -1.56367     -0.882188
 H     -0.932151     -1.56367     0.882188
 H     0.551887     -2.03461     0.00000
 C     0.149260     0.0607660     0.00000
 O     1.30165     0.439500     0.00000
 O     -0.897630     0.927778     0.00000
 H     -0.523912     1.82535     0.00000
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

cosmo
end

task dft energy

which gives

                  COSMO solvation results
                  -----------------------

                 gas phase energy =      -229.1024156124
                 sol phase energy =      -229.1172649438
 (electrostatic) solvation energy =         0.0148493315 (    9.32 kcal/mol)

D. Solvation of acetate

Title "acetate_solvation"
Start  acetate_solvation

echo

charge -1

geometry autosym units angstrom
 C     -0.0308736     -1.36399     0.00000
 H     0.503418     -1.74042     0.882388
 H     0.503418     -1.74042     -0.882388
 H     -1.05531     -1.75261     0.00000
 C     -0.00485953     0.199667     0.00000
 O     -1.12642     0.778065     0.00000
 O     1.14855     0.713544     0.00000
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

cosmo
end

task dft energy

which gives

                  COSMO solvation results
                  -----------------------
  
                 gas phase energy =      -228.5400463128
                 sol phase energy =      -228.6567490452
 (electrostatic) solvation energy =         0.1167027324 (   73.23 kcal/mol)

Putting A, B and solvation energies together:
[(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))]-[73.23-9.32]=344.54-63.910=280.63 kcal/mol

E. The Proton
If you try to do any calculations on an isolated proton you get and SCFE of zero, and you won't do much better in terms of thermochemical data. Yet, monoatomic gases obviously still posses entropy and enthalpy. Instead, the document I cite above uses the ideal gas partition function for ideal monoatomic gases which gives a value of 6.28 kcal/mol for the free energy of a proton. The last reference states that the free energy for a proton in the gas phase is experimentally determined to be -6.28 kcal/mol and that the free energy of hydration is -264.61 kcal/mol (experimental).

[(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-6.28-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))]-[73.23+264.61-9.32]=338.26-328.52=9.74 kcal/mol =40.752 kJ/mol
We're not done yet though -- make sure to continue to 'F. Standard States'

F. Standard States

We know that

DG=DG^0+RT ln (Q).

We have a bit of a problem. We're doing calculations in the gas phase (pressure) but looking at predicting solution values (concentration). Also, I don't fully get it yet, so my explanation is probably a bit fuzzy.

So if A+B-> C+D we get Q=(C*D)/(A*B) but for A -> C+D we get Q=(C*D)/(A) or (1 bar x1 bar/1 bar)= 1 bar = 101350 Pa => nRT/P=1*8.314*298.15/101350=0.024458 m3= 24.46 L. The concentration of each species is 1 mol/bar which in volume terms means 1/24.46 L.

And the A -> C + D situation is what we have if we look at
HA -> H+ + A-

So,
Q=((1/24.46)*(1/24.46)/(1/24.46))=1/2.4.46 which gives
DG=DG^0-RTln(24.46)=DG^0-7924.9 J/mol

Thus, we need to correct for the standard state: 40.752 -7924.9/1000=32.827 kJ/mol

G. Calculating pKa

Since (in solution so that concentrations are 1 M)
DG=DG^0+RTln(K), where K=([H3COO][H+])/([H3COOH])
DG=DG^0+(RT ln([HCOO]/[H3COOH])+RTln (10^(-pH)), where RT ln([HCOO]/[H3COOH])=RTln(1/1)=0 so
DG=DG^0+RTln (10^(-pH))=DG^0+RT log (10^(-pH)/log(e)=DG^0-(RT/log(e)) * pH
Which for equilbrium, where pH=pKa and DG=0, turns into
pKa=DG^0*log(e)/RT

pKa= DG*log(e)/(RT)=(32.827*1000)*log(e)/(8.314*298.15)=5.75

Not that great as predictions go (exp: pKa=4.75). Looking at some of the literature one error lies in the size of the solvation energies. Possibly one should tune the parameters used in the COSMO.

2. Isodemic reaction/correction

Using formic acid
This approach is based on 1) the similarity between two compounds and 2) us knowing the DG_solution parameter for one of them.

Assuming that we know that formic acid has a pKa of 3.75, then DG_solution=pKa*RT/log(e)=3.75*8.314*298.15/log10(e)/1000=21.404 kJ/mol. The reverse reaction is -21.404 kJ/mol.

We skip a few steps.
Here are the calculated parameters for formic acid (using the same method as above):

Formic acid

SCFE: -189.772804709496 Hartree
Enthalpy correction: 23.773 kcal/mol
Entropy correction: 59.339 cal/mol
Solvation energy: 9.99 kcal/mol

Formate
SCFE: -189.217943605798 Hartree
Enthalpy correction: 15.016 kcal/mol
Entropy correction: 56.992 cal/mol
Solvation energy: 72.47 kcal/mol

[Just for kicks we quickly look at what the prediction is:
accounting for everything (solvation, proton etc.)
((-189.217943605798*627.503+(15.016-298.15*56.992/1000)-72.47) +(-6.28-264.61))-(-189.772804709496*627.503+(23.773-298.15*59.339/1000)-9.99)=6.7498 kcal/mol
6.7498*4.184-7924.9/1000=20.316 kJ/mol <=> pKa=1000*20.316*log10(e)/(8.314*298.15)=3.56]

The isodesmic approach:

Here we look at
H3CCOOH + -OOCH -> H3CCOO- +HOOCH

This combined reaction has a
DG_solution=DG_solution(acetic acid/acetate)+DG_solution(formate/formic acid)
<=>
DG_solution(acetic acid/acetate)= DG_solution-DG_solution(formic acid/formate)
=DG_solution-(-21.404 kJ/mol)
=DG_gas+[DG_sol(acetate)+DG_sol(formic acid)-DG_sol(acetic acid)-DG_sol(formate)]+21.404 kJ/mol
=
(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))+(-189.772804709496*627.503+(23.773-298.15*(59.339/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))-(-189.217943605798*627.503+(15.016-298.15*(56.992/1000)))+(-73.23-9.99+9.32+72.47)+5.116 kcal/mol= 8.11 kcal/mol= 33.93 kJ/mol

Here we don't need to fiddle with standard states or experimental values for solvation of the proton.

pKa= DG*log(e)/(RT)=(33.93*1000)*log(e)/(8.314*298.15)=5.94
which is even worse than before...
We want ca 27 kJ/mol = 6.48 kcal/mol. Paradoxically this may be due to the ab initio approach to the pKa of formic acid actually giving a very reasonable value.

Using Propanoic acid
So let's try the isodesmic approach using propanoic acid as our reference instead.

Propanoic acid

SCFE: -268.419515785389 Hartree
Enthalpy correction: 60.894 kcal/mol
Entropy correction: 75.184 cal/mol
Solvation energy: 8.15 kcal/mol

Propanoate
SCFE: -267.857478200414 Hartree
Enthalpy correction: 52.096 kcal/mol
Entropy correction: 75.392 cal/mol
Solvation energy: 72.14 kcal/mol

pKa=4.86 <=> 27.739 kJ/mol= 6.63 kcal/mol

(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))+(-268.419515785389*627.503+(60.894-298.15*(75.184/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))-(-267.857478200414*627.503+(52.096-298.15*(75.392/1000)))+(-73.23-8.15+9.32+72.14)+6.63 kcal/mol=7.4324 kcal/mol=31.097

pKa= DG*log(e)/(RT)=(31.097*1000)*log(e)/(8.314*298.15)=5.45

It's a bit better, but still a bit off.

3. Conclusion:
The isodesmic approach is not magic and it relies on the similarity of two compounds, one for which there are experimental data, causing similar computational issues. Under the right conditions it's a useful approach, whereas under other conditions -- where a body of experimental data exists -- it might just be easier to determine the correlation between experimental and calculated data via fitting.

The approach worked better for the acetate/propanoate pair than the formate/acetate pair -- and one would consider acetic acid and propanoic acid to be more similar than formic acid and the higher acids. We're still far off from obtaining a perfect result though.

An additional problem is obviously the sensitivity of pKa to the DG -- one pH unit is about 1.36 kcal/mol, which is very small given the usual errors in DFT level calculations. I've seen indications online (google!) that the accuracy of b3lyp is about 3 kcal/mol, and one can always debate the accuracy of a highly empirical method like COSMO.

13 September 2012

235. CPMD with Netlib's lapack, blas and your own fftw3 on ROCKS 5.4.3/CentOS 5.6

Update 8 Feb 2013:
I somehow had forgot to include some of the instructions for the BLAS part. Fixed now.

Post:
This is done pretty much like how it's done on Debian (-march=native didn't work in the BLAS compilation though, nor was -fno-whole-file accepted when compiling cpmd)

1. Compile cmake according to this post:
http://verahill.blogspot.com.au/2012/05/compiling-openbabel-231-and-cmake-on.html

2. Compile BLAS
sudo mkdir /share/apps/tools/netlib/blas/lib -p
sudo chown $USER /share/apps/tools/netlib -R

mkdir ~/tmp
cd ~/tmp
wget http://www.netlib.org/blas/blas.tgz
tar xvf blas.tgz
cd BLAS/

Edit make.inc

OPTS = -O3 -shared -m64 -fPIC

make all

gfortran -shared -Wl,-soname,libnetblas.so -o libblas.so.1.0.1 *.o -lc
ln -s libblas.so.1.0.1 libnetblas.so
cp lib*blas* /share/apps/tools/netlib/blas/lib

3. Compile LAPACK
sudo mkdir /share/apps/tools/netlib/lapack -p
sudo chown $USER /share/apps/tools/netlib -R

wget http://www.netlib.org/lapack/lapack-3.4.1.tgz

tar xvf lapack-3.4.1.tgz
cd lapack-3.4.1/
mkdir build
cd build
ccmake ../

Hit 'c' and edit the values:

 BUILD_COMPLEX                    ON
 BUILD_COMPLEX16                  ON
 BUILD_DOUBLE                     ON
 BUILD_SHARED_LIBS                ON
 BUILD_SINGLE                     ON
 BUILD_STATIC_LIBS                ON
 BUILD_TESTING                    ON
 CMAKE_BUILD_TYPE                    
 CMAKE_INSTALL_PREFIX             /share/apps/tools/netlib/lapack
 LAPACKE                          OFF
 LAPACKE_WITH_TMG                 OFF
 USE_OPTIMIZED_BLAS               ON
 USE_XBLAS                        OFF

Hit 'c' again, then hit 'g'.

Edit CMakeCache.txt and add the following lines at the beginning:

########################
# EXTERNAL cache entries
########################
BLAS_FOUND:STRING=TRUE
BLAS_GENERIC_FOUND:BOOL=TRUE
BLAS_GENERIC_blas_LIBRARY:FILEPATH=/share/apps/tools/netlib/blas/lib/libnetblas.so
BLAS_LIBRARIES:PATH=/share/apps/tools/netlib/blas/lib/libnetblas.so

Do
ccmake ../
again, hit 'c', then 'g'.

Now,
make
make install

4. Compile FFTW3

sudo mkdir /share/apps/tools/fftw3
sudo chown $USER /share/apps/tools/fftw3
cd ~/tmp
wget http://www.fftw.org/fftw-3.3.1.tar.gz
tar -xvf fftw-3.3.1.tar.gz
cd fftw-3.3.1
make distclean
./configure --enable-float --enable-mpi --enable-threads --with-pic --prefix=/share/apps/tools/fftw3/single
make
make install
make distclean
./configure --disable-float --enable-mpi --enable-threads --with-pic --prefix=/share/apps/tools/fftw3/double
make 
make install

5. Compile CPMD
I downloaded the cpmd file to a client computer, then uploaded it to the ROCKS front node:
sftp me@rocks:/home/me/tmp

Connected to rocks.
Changing to: /home/me/tmp
sftp> put cpmd-v3_15_3.tar.gz
Uploading cpmd-v3_15_3.tar.gz to /home/me/tmp/cpmd-v3_15_3.tar.gz
cpmd-v3_15_3.tar.gz                100% 2937KB 587.4KB/s   00:05
sftp> exit

I then logged in via ssh as normal.

cd ~/tmp
tar xvf cpmd-v3_15_3.tar.gz
cd CPMD/CONFIGURE

Create a new file LINUX-x86_64-ROCKS

     IRAT=2
     CFLAGS='-c -O2 -Wall'
     CPP='/lib/cpp -P -C -traditional'
     CPPFLAGS='-D__Linux -D__PGI -D__GNU -DFFT_FFTW3 -DPARALLEL -DPOINTER8'
     FFLAGS='-c -O2 -fcray-pointer -fsecond-underscore'
LFLAGS='-L/share/apps/tools/fftw3/double/lib -lfftw3-lfftw3_mpi -lfftw3_threads -I/usr/include -L/share/apps/tools/netlib/blas/lib -lnetblas -L/share/apps/tools/netlib/lapack/lib -llapack -L/opt/openmpi/lib -lpthread -lmpi'
     FFLAGS_GROMOS='  $(FFLAGS)' 
      FC='mpif77 -fbounds-check'
      CC='mpicc'
      LD='mpif77 -fbounds-check'

NOTE: I don't think the -I belongs in the LFLAGS statement, but I'm presuming that I put it there for a reason back when I did it the first time.

Go to ~/tmp/CPMD, and edit wfnio.F (basically replace 3 with 2 and remove 'L'):

 15       CHARACTER(len=*) TAG
 63         IF(TAG(1:2).EQ.'NI') THEN
201       IF(TAG(1:2).NE.'NI') THEN
271         IF(TAG(1:2).EQ.'NI') THEN

Finally, edit Makefile and change

  23 LD = f95 -O

  23 LD = mpif77 -fbounds-check

Time to compile

./mkconfig.sh LINUX-x86_64-ROCKS > Makefile
make
sudo mkdir /share/apps/cpmd
sudo chown $USER /share/apps/cpmd
cp cpmd.x /share/apps/cpmd

echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/tools/netlib/blas/lib:/share/apps/tools/netlib/lapack/lib:/share/apps/tools/fftw3/double/lib' >>~/.bashrc
echo 'export PATH=$PATH:/share/apps/cpmd' >> ~/.bashrc
echo "export PP_LIBRARY_PATH=/share/apps/cpmd/PP_LIBRARY" >>~/.bashrc

You're now done compiling. To test, you need to get some pseudopotential files -- look at e.g. the end of http://verahill.blogspot.com.au/2012/07/not-solved-compiling-cpmd-on-debian.html for instructions.

234. CPMD with netlib lapack, blas and your own fftw on debian testing

This is a minor update to my previous post on CPMD. Back in the days I had issue linking to my Openblas libs (got a binary which would not run properly) but I've since had success with the netlib lapack and blas libs.

1. Compile the netlib lapack and blas libraries according to this post: http://verahill.blogspot.com.au/2012/09/compiling-netlibs-lapack-and-blas-on.html

2. Compile the fftw libraries according to this post (ignore the sections on Openblas and Gromacs):
http://verahill.blogspot.com.au/2012/05/gromacs-with-external-fftw3-and-blas-on.html

3. Compile CPMD. We'll be following this post in large parts.
Register with cpmd.org. Once you're approved download the cpmd source to ~/tmp.

sudo apt-get install libopenmpi-dev openmpi-bin

cd ~/tmp
tar -xvf cpmd-v3_15_3.tar.gz
cd CPMD/CONFIGURE

Create the file LINUX-x86_64-DEBIAN:

   
     IRAT=2
     CFLAGS='-c -O2 -Wall'
     CPP='/lib/cpp -P -C -traditional'
     CPPFLAGS='-D__Linux -D__PGI -D__GNU -DFFT_FFTW3 -DPARALLEL -DPOINTER8'
     FFLAGS='-c -O2 -fcray-pointer -fno-whole-file -fsecond-underscore'
     LFLAGS='-l:/opt/fftw/fftw-3.3.2/double/lib/libfftw3.a -l:/opt/fftw/fftw-3.3.2/double/lib/libfftw3_mpi.a -l:/opt/fftw/fftw-3.3.2/double/lib/libfftw3_threads.a -I/usr/include -l:/opt/netlib/blas/lib/libnetblas.so -l:/opt/netlib/lapack/lib/liblapack.so -lpthread -lmpi'
     FFLAGS_GROMOS='  $(FFLAGS)' 
      FC='mpif77 -fbounds-check'
      CC='mpicc'
      LD='mpif77 -fbounds-check'

Next edit ~/tmp/CPMD/wfnio.F and change the following lines:

 15       CHARACTER(len=*) TAG
 63         IF(TAG(1:2).EQ.'NI') THEN
201       IF(TAG(1:2).NE.'NI') THEN
271         IF(TAG(1:2).EQ.'NI') THEN

Now, in ~/tmp/CPMD, run

./mkconfig.sh LINUX-x86_64-DEBIAN > Makefile
make
sudo mkdir /opt/cpmd
sudo chown $USER /opt/cpmd
cp cpmd.x /opt/cpmd

And follow everything below 'Done! Almost.' in this post: http://verahill.blogspot.com.au/2012/07/not-solved-compiling-cpmd-on-debian.html

12 September 2012

233. Compiling netlib's lapack and blas on Debian Testing (Wheezy)

In addition to specific BLAS/LAPACK libs such as ACML, MKL, and ATLAS netlib provides (what I understand to be) reference versions of BLAS and LAPACK. Presumably these are slower than optimised versions of blas/lapack, but it doesn't hurt being familiar with them.

Here's how to compile those versions.

BLAS

sudo mkdir /opt/netlib
sudo chown $USER /opt/netlib
mkdir /opt/netlib/blas/lib -p
wget http://www.netlib.org/blas/blas.tgz
tar xvf blas.tgz
cd BLAS/

Edit make.inc
OPTS = -O3 -shared -m64 -march=native -fPIC

make all
gfortran -shared -Wl,-soname,libnetblas.so -o libblas.so.1.0.1 *.o -lc
ln -s libblas.so.1.0.1 libnetblas.so
cp lib*blas* /opt/netlib/blas/lib

To see whether everything linked ok:
ldd libnetblas.so

        linux-vdso.so.1 =>  (0x00007ffff1bc6000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002b42ec030000)
        libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002b42ec3b8000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002b42ec6ce000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002b42ec950000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002b42ecb67000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b42ebaf3000)

LAPACK
(inspired by this and this)
mkdir -p /opt/netlib/lapack
sudo apt-get install cmake-curses-gui
cd ~/tmp
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
tar xvf lapack-3.4.1.tgz
cd lapack-3.4.1/
mkdir build
cd build
ccmake ../

Hit 'c' to generate a configuration. Navigate with arrow keys and hit enter to change values. Change to the values in red:

 
 BUILD_COMPLEX                   *ON
 BUILD_COMPLEX16                 *ON
 BUILD_DOUBLE                    *ON
 BUILD_SHARED_LIBS               *ON
 BUILD_SINGLE                    *ON
 BUILD_STATIC_LIBS               *ON
 BUILD_TESTING                   *ON
 CMAKE_BUILD_TYPE                *     
 CMAKE_INSTALL_PREFIX            */opt/netlib/lapack
 LAPACKE                         *OFF
 LAPACKE_WITH_TMG                *OFF
 USE_OPTIMIZED_BLAS              *ON
 USE_XBLAS                       *OFF

Then hit 'c' which might give you (change the values in red) -- I got some errors about ACML/eula here, but don't worry about that.

NOTE: this will only work if you already have blas installed in a standard location. If you don't get the BLAS_FOUND etc. then you should hit 'c' again and then 'g'. Next edit your CMakeCache.txt and paste the variables (without line numbers) you find below this section, then do ccmake ../ again and make sure everything looks ok, and generate using 'g'.

 BLAS_FOUND                       TRUE
 BLAS_GENERIC_FOUND               ON
 BLAS_GENERIC_blas_LIBRARY        /opt/netlib/blas/lib/libnetblas.so
 BLAS_LIBRARIES                   /opt/netlib/blas/lib/libnetblas.so
 BLAS_LINKER_FLAGS
 BUILD_COMPLEX                   *ON
 BUILD_COMPLEX16                 *ON
 BUILD_DOUBLE                    *ON
 BUILD_SHARED_LIBS               *OFF
 BUILD_SINGLE                    *ON
 BUILD_STATIC_LIBS               *ON
 BUILD_TESTING                   *ON
 CMAKE_BUILD_TYPE                *     
 CMAKE_INSTALL_PREFIX            */usr/local 
 LAPACKE                         *OFF
 LAPACKE_WITH_TMG                *OFF
 USE_OPTIMIZED_BLAS              *ON
 USE_XBLAS                       *OFF

The hit 'c' again. If there were no issues, hit 'g' which writes the configuration and exits.

make

[100%] Building Fortran object TESTING/EIG/CMakeFiles/xeigtstz.dir/__/__/INSTALL/dsecnd_INT_ETIME.f.o
Linking Fortran executable ../../bin/xeigtstz
[100%] Built target xeigtstz

make install

Install the project...
-- Install configuration: ""
-- Installing: /opt/netlib/lapack/lib/pkgconfig/lapack.pc
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-config.cmake
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-config-version.cmake
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-targets.cmake
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-targets-noconfig.cmake
-- Installing: /opt/netlib/lapack/lib/liblapack.so
-- Removed runtime path from "/opt/netlib/lapack/lib/liblapack.so"
-- Installing: /opt/netlib/lapack/lib/libtmglib.so
-- Removed runtime path from "/opt/netlib/lapack/lib/libtmglib.so"

tree /opt/netlib/ -d
/opt/netlib/
|-- blas
|   `-- lib
`-- lapack
    `-- lib
        |-- cmake
        |   `-- lapack-3.4.1
        `-- pkgconfig

7 directories

CMakeCache.txt variables:

 16 
 17 BLAS_FOUND:STRING=TRUE
 18 
 19 //Whether not the GENERIC library was found and is usable
 20 BLAS_GENERIC_FOUND:BOOL=TRUE
 21 
 22 //Path to a library.
 23 BLAS_GENERIC_blas_LIBRARY:FILEPATH=/opt/netlib/blas/lib/libnetblas.so
 24 
 25 BLAS_LIBRARIES:PATH=/opt/netlib/blas/lib/libnetblas.so
 26

Testing the libraries:
I built gromacs against the new libs to make sure they 'worked'

sudo mkdir /opt/gromacs
sudo chown ${USER} /opt/gromacs
cd ~/tmp
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-4.5.5.tar.gz
tar xvf gromacs-4.5.5.tar.gz
cd gromacs-4.5.5/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/netlib/blas/lib:/opt/netlib/lapack/lib
export LDFLAGS="-l:/opt/netlib/blas/lib/libnetblas.so -l:/opt/netlib/lapack/lib/liblapack.so"
./configure --disable-mpi --enable-float --with-external-blas --with-external-lapack --program-suffix=_netlib --prefix=/opt/gromacs/gromacs-4.5.5
make

make install

Check that it linked ok:

ldd /opt/gromacs/gromacs-4.5.5/bin/grompp_netlib
        linux-vdso.so.1 =>  (0x00007fffb83f2000)
        libgmxpreprocess.so.6 => /opt/gromacs/gromacs-4.5.5/lib/libgmxpreprocess.so.6 (0x00002b6411cfa000)
        libmd.so.6 => /opt/gromacs/gromacs-4.5.5/lib/libmd.so.6 (0x00002b6411fcd000)
        libfftw3f.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3f.so.3 (0x00002b64123ad000)
        libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00002b64127b0000)
        libgmx.so.6 => /opt/gromacs/gromacs-4.5.5/lib/libgmx.so.6 (0x00002b6412b10000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00002b6412fe5000)
        libnetblas.so => /opt/netlib/blas/lib/libnetblas.so (0x00002b64131e9000)
        liblapack.so => /opt/netlib/lapack/lib/liblapack.so (0x00002b64134cc000)
        libnsl.so.1 => /lib/x86_64-linux-gnu/libnsl.so.1 (0x00002b6413ece000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002b64140e6000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00002b6414369000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002b6414585000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00002b641490c000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00002b6414b24000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b6411ad8000)
        libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002b6414d47000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002b641505d000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002b6415274000)

Here are some input files (it's not a 'real' md run -- I just needed something small and quick to run):
step1.top:

#include "/opt/gromacs/gromacs-4.5.5/share/gromacs/top/ffoplsaa.itp"
#include "/opt/gromacs/gromacs-4.5.5/share/gromacs/top/oplsaa.ff/tip4p.itp"

[system]
test 

[molecules]

step1.mdp:

integrator = md
define      = -DFLEXIBLE
emtol      = 1000.0
emstep     = 0.001
nsteps     = 5000
nstlist    = 1
ns_type    = grid 
rlist      = 0.9
coulombtype= PME  
rcoulomb   = 0.9  
rvdw       = 1.0  
pbc        =  xyz

genbox_netlib -o step1.gro -cs /opt/gromacs/gromacs-4.5.5/share/gromacs/top/tip4p.gro -box 4x4x4 -p step1.top

grompp_netlib -f step1.mdp -po step2.mdp -p step1.top -pp step2.top -c step1.gro -o step2.tpr

mdrun_netlib -v -s step2.tpr -o step3.trr -x step3.xtc -cpo step3.cpt -c step3.gro -e step3.edr -g step3.log

On my old AMD II X3 I got about 7.7 GFLOPS with Openblas and 7.8 GFLOPS with the above libs. Note that the run is shorter than a minute so it's pretty useless for benchmarking. However, there's no obvious MAJOR penalty.

If you don't have cmake:
cp INSTALL/make.inc.gfortran make.inc

Edit make.inc

15 FORTRAN = gfortran
16 OPTS = -O2 -fPIC -m64
17 DRVOPTS = $(OPTS)
18 NOOPT = -O0 -fPIC -m64
19 LOADER = gfortran
20 LOADOPTS =

Edit Makefile

11 #lib: lapacklib tmglib
12 lib: blaslib variants lapacklib tmglib

Run make

make

-->  Tests passed: 13176


   -->   LAPACK TESTING SUMMARY  <--
  Processing LAPACK Testing output found in the TESTING direcory
SUMMARY              nb test run  numerical error    other error  
================    =========== ================= ================  
REAL              1077227  0 (0.000%) 0 (0.000%) 
DOUBLE PRECISION 1078039  0 (0.000%) 0 (0.000%) 
COMPLEX           522814  0 (0.000%) 0 (0.000%) 
COMPLEX16          552410  0 (0.000%) 0 (0.000%) 

--> ALL PRECISIONS 3230490  0 (0.000%) 0 (0.000%)

Older version:
In the oldest version of this post I did the blas compilation by hand:

gfortran -O2 -fPIC -m64 -march=native -funroll-all-loops -c *.f

To build a static library:
ar rvs libblas.a *.o

To build a shared/dynamic library:
gfortran -shared -Wl,-soname,libnetblas.so -o libblas.so.1.0.1 *.o -lc

ldd libblas.so.1.0.1

linux-vdso.so.1 => (0x00007fff301af000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002aeeac390000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002aeeac718000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002aeeaca2e000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002aeeaccb0000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002aeeacec7000)
/lib64/ld-linux-x86-64.so.2 (0x00002aeeabedd000)

Either way:
cp libblas* /opt/netlib/blas/lib

To test:
wget http://www.netlib.org/blas/sblat1
mv sblat1 sblat1.f

And EITHER
gfortran sblat1.f -l:libblas.a

OR
ln -s libblas.so.1.0.1 libnetblas.so
gfortran sblat1.f -l:libnetblas.so

THEN
./a.out

 
Real BLAS Test Program Results
Test of subprogram number  1             SDOT 
                                    ----- PASS -----

 Test of subprogram number  2            SAXPY 
                                    ----- PASS -----

 Test of subprogram number  3            SROTG 
                                    ----- PASS -----

 Test of subprogram number  4             SROT 
                                    ----- PASS -----

 Test of subprogram number  5            SCOPY 
                                    ----- PASS -----

 Test of subprogram number  6            SSWAP 
                                    ----- PASS -----

 Test of subprogram number  7            SNRM2 
                                    ----- PASS -----

 Test of subprogram number  8            SASUM 
                                    ----- PASS -----

 Test of subprogram number  9            SSCAL 
                                    ----- PASS -----

 Test of subprogram number 10            ISAMAX
                                    ----- PASS -----

11 September 2012

232. Compile parallel (threaded) povray 3.7-rc6 on Debian Wheezy

Update 13 May 2013: This build won't work with v3.7-rc7 on debian wheezy if you have libjpeg62 installed. See http://verahill.blogspot.com.au/2013/05/413-povray-37-rc7-on-debian-wheezy.html.

Remove libjpeg62 and it works fine though.

Original post
Expanding my little cluster has got me thinking about additional uses for it. The primary purpose is obviously work i.e. MD simulations using gromacs and ab initio calcs using NWChem and Gaussian. I'm also testing it with John the Ripper to see how well the users of the linux box in the lab are choosing their passwords.

At that point I realised that it'd be sweet to have at least an OMP capable version of povray to speed things up when polishing figures for those elusive journal covers.

Debian testing currently uses v. 3.6.1 of povray but

POV-Ray 3.6 does not support multithreaded rendering. POV-Ray 3.7 does.

So compile we will although v 3.7 is beta, so be aware.

sudo mkdir /opt/povray
sudo chown $USER /opt/povray

wget http://povray.org/redirect/www.povray.org/beta/source/povray-3.7.0.RC6.tar.gz
tar xvf povray-3.7.0.RC6.tar.gz
cd povray-3.7.0.RC6/
sudo apt-get install libboost-all-dev libpng-dev libjpeg-dev libtiff-dev build-essential libsdl-dev

Note: libboost-all-dev is big. It might be enough with libboost-thread-dev

./configure --prefix=/opt/povray --program-suffix=_3.7 COMPILED_BY="me@here"

===============================================================================
POV-Ray 3.7.0.RC5 has been configured.

Built-in features:
  I/O restrictions:          enabled
  X Window display:          disabled
  Supported image formats:   gif tga iff ppm pgm hdr png jpeg tiff
  Unsupported image formats: openexr

Compilation settings:
  Build architecture:  x86_64-unknown-linux-gnu
  Built/Optimized for: x86_64-unknown-linux-gnu (using -march=native)
  Compiler vendor:     gnu
  Compiler version:    g++ 4.7
  Compiler flags:      -pipe -Wno-multichar -Wno-write-strings -fno-enforce-eh-specs -s -O3 -ffast-math -march=native -pthread

Type 'make check' to build the program and run a test render.
Type 'make install' to install POV-Ray on your system.

The POV-Ray components will be installed in the following directories:
  Program (executable):       /opt/povray/bin
  System configuration files: /opt/povray/etc/povray/3.7
  User configuration files:   $HOME/.povray/3.7
  Standard include files:     /opt/povray/share/povray-3.7/include
  Standard INI files:         /opt/povray/share/povray-3.7/ini
  Standard demo scene files:  /opt/povray/share/povray-3.7/scenes
  Documentation (text, HTML): /opt/povray/share/doc/povray-3.7
  Unix man page:              /opt/povray/share/man
===============================================================================

The way it is configured we can keep our debian version of povray and install the newer version (povray_3.7)

make
make install

Seems like -geometry 1000x1000 doesn't work anymore. Instead use -H1000 -W1000

I've played around with it a little bit and it does parallel (threaded) execution nicely.

wget http://www.ms.uky.edu/~lee/visual05/povray/fourcube7.pov
./povray_3.7 -H1000 -W1000 fourcube7.pov +A0.1
takes 9 seconds on an AMD II X3. The standard, serial Debian version takes 21 seconds.

231. Compiling john the ripper: single/serial, parallel/OMP and MPI

Update: updated for v1.7.9-jumbo-7 since hccap2john in 1.7.9-jumbo-6 was broken

For no particular reason at all, here's how to compile John the Ripper on Debian Testing (Wheezy). It's very easy, and this post is probably a bit superfluous. The standard version only supports serial and parallel (OMP). See below for MPI.

The regular version:

mkdir ~/tmp
cd ~/tmp
wget http://www.openwall.com/john/g/john-1.7.9.tar.gz
tar xvf john-1.7.9.tar.gz
cd john-1.7.9/src

If you don't edit the Makefile you build a serial/single-threaded binary.
If you want to build a threaded version for a single node with a multicore processor (OMP) do:
Edit Makefile and uncomment row 19 or 20

18 # gcc with OpenMP
19 OMPFLAGS = -fopenmp
20 OMPFLAGS = -fopenmp -msse2

make clean linux-x86-64

cd ../run

You now have a binary called john in your ../run folder.

The Jumbo version:
If you want to build a distributed version with MPI (can split jobs across several nodes) you need the enhanced, community version:

sudo apt-get install openmpi-bin libopenmpi-dev

cd ~/tmp

wget http://www.openwall.com/john/g/john-1.7.9-jumbo-7.tar.gz

tar xvf john-1.7.9-jumbo-7.tar.gz

cd john-1.7.9-jumbo-7/src

Edit the Makefile

20 ## Uncomment the TWO lines below for MPI (can be used together with OMP as well)

21 ## For experimental MPI_Barrier support, add -DJOHN_MPI_BARRIER too.

22 ## For experimental MPI_Abort support, add -DJOHN_MPI_ABORT too.

23 CC = mpicc -DHAVE_MPI

24 MPIOBJ = john-mpi.o

and do

make clean linux-x86-64-native
cd ../run

I had a look at the passwords on one of our lab boxes -- it immediately discovered that someone had used 'password' as the password...

These test were run on my old AMD II X3 445. Processes which don't speed up with MP are highlighted in red. LM DES is borderline -- it's faster, but doesn't scale well.

Here's the single thread/serial version:
./john --test

Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts: 2906K c/s real, 2918K c/s virtual
Only one salt: 2796K c/s real, 2807K c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts: 95564 c/s real, 95948 c/s virtual
Only one salt: 93593 c/s real, 93781 c/s virtual
Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw: 14094 c/s real, 14122 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw: 918 c/s real, 919 c/s virtual
Benchmarking: Kerberos AFS DES [48/64 4K]... DONE
Short: 474316 c/s real, 475267 c/s virtual
Long: 1350K c/s real, 1356K c/s virtual
Benchmarking: LM DES [128/128 BS SSE2-16]... DONE
Raw: 39843K c/s real, 39923K c/s virtual
Benchmarking: generic crypt(3) [?/64]... DONE
Many salts: 262867 c/s real, 263393 c/s virtual
Only one salt: 260121 c/s real, 260642 c/s virtual
Benchmarking: Tripcode DES [48/64 4K]... DONE
Raw: 369843 c/s real, 370584 c/s virtual
Benchmarking: dummy [N/A]... DONE
Raw: 99512K c/s real, 99712K c/s virtual

Here's the OMP version:

Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts: 6706K c/s real, 2555K c/s virtual
Only one salt: 5015K c/s real, 2091K c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts: 205670 c/s real, 85411 c/s virtual
Only one salt: 238524 c/s real, 86720 c/s virtual
Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw: 38400 c/s real, 13812 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw: 2306 c/s real, 845 c/s virtual
Benchmarking: Kerberos AFS DES [48/64 4K]... DONE
Short: 474675 c/s real, 476581 c/s virtual
Long: 1332K c/s real, 1335K c/s virtual
Benchmarking: LM DES [128/128 BS SSE2-16]... DONE
Raw: 49046K c/s real, 16785K c/s virtual
Benchmarking: generic crypt(3) [?/64]... DONE
Many salts: 721670 c/s real, 246640 c/s virtual
Only one salt: 699168 c/s real, 239605 c/s virtual
Benchmarking: Tripcode DES [48/64 4K]... DONE
Raw: 367444 c/s real, 369657 c/s virtual
Benchmarking: dummy [N/A]... DONE
Raw: 100351K c/s real, 100552K c/s virtual

And here's the MPI version:
mpirun -n 3 ./john --test
(note that this includes a great many more tests than the default version)

Benchmarking: Traditional DES [128/128 BS SSE2-16]... (3xMPI) DONE
Many salts: 8533K c/s real, 8707K c/s virtual
Only one salt: 7705K c/s real, 8110K c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... (3xMPI) DONE
Many salts: 279808 c/s real, 282634 c/s virtual
Only one salt: 273362 c/s real, 276096 c/s virtual
Benchmarking: FreeBSD MD5 [128/128 SSE2 intrinsics 12x]... (3xMPI) DONE
Raw: 65124 c/s real, 65781 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... (3xMPI) DONE
Raw: 2722 c/s real, 2749 c/s virtual
Benchmarking: Kerberos AFS DES [48/64 4K]... (3xMPI) DONE
Short: 1387K c/s real, 1415K c/s virtual
Long: 3880K c/s real, 3959K c/s virtual
Benchmarking: LM DES [128/128 BS SSE2-16]... (3xMPI) DONERaw: 114781K c/s real, 115940K c/s virtual

I don't quite understand the Kerberos results.

Other targets of interest are:

linux-x86-64-avx Linux, x86-64 with AVX (2011+ Intel CPUs)
linux-x86-64-xop Linux, x86-64 with AVX and XOP (2011+ AMD CPUs)
linux-x86-64 Linux, x86-64 with SSE2 (most common)
linux-x86-avx Linux, x86 32-bit with AVX (2011+ Intel CPUs)
linux-x86-xop Linux, x86 32-bit with AVX and XOP (2011+ AMD CPUs)
linux-x86-sse2 Linux, x86 32-bit with SSE2 (most common, if 32-bit)
linux-x86-mmx Linux, x86 32-bit with MMX (for old computers)
linux-x86-any Linux, x86 32-bit (for truly ancient computers)

The FX 8150 does AVX and XOP, while my 1055T doesn't.

The community version has more options:

linux-x86-64-native Linux, x86-64 'native' (all CPU features you've got)
linux-x86-64-gpu Linux, x86-64 'native', CUDA and OpenCL (experimental)
linux-x86-64-opencl Linux, x86-64 'native', OpenCL (experimental)
linux-x86-64-cuda Linux, x86-64 'native', CUDA (experimental)
linux-x86-64-avx Linux, x86-64 with AVX (2011+ Intel CPUs)
linux-x86-64-xop Linux, x86-64 with AVX and XOP (2011+ AMD CPUs)
linux-x86-64[i] Linux, x86-64 with SSE2 (most common)
linux-x86-64-icc Linux, x86-64 compiled with icc
linux-x86-64-clang Linux, x86-64 compiled with clang
linux-x86-gpu Linux, x86 32-bit with SSE2, CUDA and OpenCL (experimental)
linux-x86-opencl Linux, x86 32-bit with SSE2 and OpenCL (experimental)
linux-x86-cuda Linux, x86 32-bit with SSE2 and CUDA (experimental)
linux-x86-sse2[i] Linux, x86 32-bit with SSE2 (most common, 32-bit)
linux-x86-native Linux, x86 32-bit, with all CPU features you've got (not necessarily best)
linux-x86-mmx Linux, x86 32-bit with MMX (for old computers)
linux-x86-any Linux, x86 32-bit (for truly ancient computers)
linux-x86-clang Linux, x86 32-bit with SSE2, compiled with clang
linux-alpha Linux, Alpha
linux-sparc Linux, SPARC 32-bit
linux-ppc32-altivec Linux, PowerPC w/AltiVec (best)
linux-ppc32 Linux, PowerPC 32-bit
linux-ppc64 Linux, PowerPC 64-bit
linux-ia64 Linux, IA-64

10 September 2012

230. ROCKS 5.4.3, ATLAS and Gromacs on Xeon X3480

After doing another round of 'benchmarks' (there are so many factors that differ between the systems that it's difficult to tell exactly what I'm measuring) I'm back to looking at my BLAS/LAPACK.

So here's compiling ATLAS on a cluster made up of six dual-socket mobos with 2x quadcore XeonX3480 CPUs and 8 Gb RAM. The cluster is running ROCKS 5.4.3, which is a spin based on Centos 5.6. We then compile GROMACS using ATLAS and compare it with Openblas. Please note that I am not an expert on optimisations (or computers or anything) so comparing Openblas vs ATLAS won't tell you which one is 'better'. They are just numbers based on what someone once observed on a particular system under a particular set of circumstances.

Hurdles: I first had to deal with the lapack + bad symbols + recompile with -fPIC problem (solved by using netlib lapack and building shared libraries), then encountered the 'libgmx.so: undefined reference to _gfortran_' issue (solved by adding -lgfortran to LDFLAGS).

ATLAS
sudo mkdir /share/apps/ATLAS
sudo chown $USER /share/apps/ATLAS
cd ~/tmp
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
wget http://downloads.sourceforge.net/project/math-atlas/Developer%20%28unstable%29/3.9.72/atlas3.9.72.tar.bz2
tar xvf atlas3.9.72.tar.bz2
cd ATLAS/
mkdir build
cd build
.././configure --prefix=/share/apps/ATLAS -Fa alg '-fPIC' --with-netlib-lapack-tarfile=$HOME/tmp/lapack-3.4.1.tgz --shared

OS configured as Linux (1)
Assembly configured as GAS_x8664 (2)
Vector ISA Extension configured as SSE3 (6,448)
Architecture configured as Corei1 (25)
Clock rate configured as 3059Mhz

make

DONE STAGE 5-1-0 at 15:23
ATLAS install complete. Examine
ATLAS/bin/<arch>/INSTALL_LOG/SUMMARY.LOG for details.

ls lib/

libatlas.a libcblas.a libf77blas.a libf77refblas.a liblapack.a libptcblas.a libptf77blas.a libptlapack.a libsatlas.so libtatlas.so libtstatlas.a Makefile Make.inc

make install

In addition to successful copying you'll also get errors along the lines of

cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libsatlas.dylib': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libtatlas.dylib /share/apps/ATLAS/lib/.
cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libtatlas.dylib': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libsatlas.dll /share/apps/ATLAS/lib/.
cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libsatlas.dll': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libtatlas.dll /share/apps/ATLAS/lib/.
cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libtatlas.dll': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libsatlas.so /share/apps/ATLAS/lib/.
cp /home/me/tmp/ATLAS/build/lib/libtatlas.so /share/apps/ATLAS/lib/.

because those files don't exist.

Gromacs

FFTW3 was first build according to this. The only difference is the install targets (--prefix) -- I put things in /share/apps/gromacs/.fftwsingle and /share/apps/gromacs/.fftwdouble. Gromacs was downloaded and extracted as shown in that post, and /share/apps/gromacs was created.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openmpi/lib:/share/apps/ATLAS/lib
#single precision
export LDFLAGS="-L/share/apps/gromacs/.fftwsingle/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwsingle/include -I/share/apps/ATLAS/include/atlas"
./configure --disable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spa --prefix=/share/apps/gromacs
make -j3
make install
#double precision
make distclean
export LDFLAGS="-L/share/apps/gromacs/.fftwdouble/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwdouble/include -I/share/apps/ATLAS/include/atlas"
./configure --disable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpa --prefix=/share/apps/gromacs
make -j3
make install
#single + mpi
make distclean
export LDFLAGS="-L/share/apps/gromacs/.fftwsingle/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwsingle/include -I/share/apps/ATLAS/include/atlas""
./configure --enable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spampi --prefix=/share/apps/gromacs
make -j3
make install
#double + mpi
make distclean
export LDFLAGS="-L/share/apps/gromacs/.fftwdouble/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwdouble/include -I/share/apps/ATLAS/include/atlas"
./configure --enable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpampi --prefix=/share/apps/gromacs
make -j3
make install

The -lgfortran is IMPORTANT, or you'll end up with
libgmx.so: 'undefined reference to _gfortran_' type errors.

Performance
I ran a 6x6x6 nm box of water for 5 million steps (10 ns) to get a rough idea of the performance.
Make sure to put

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/ATLAS/lib

in your ~/.bashrc, and to include it in your SGE jobs files (if that's what you use).

I allocated 8 Gb RAM and 8 cores for each run.

Double precision:
Openblas: 10.560 ns/day (11.7 GFLOPS, runtime 8182 seconds)
ATLAS : 10.544 ns/day (11.6 GFLOPS, runtime 8194 seconds)

Single precision:
Openblas: 17.297 ns/dat (19.1 GFLOPS, runtime 4995 seconds)
ATLAS: 17.351 ns/day (19.2 GFLOPS, runtime 4980 seconds)
That's 15 seconds difference on a 1h 20 min run. I'd say they are identical.

07 September 2012

229. Compile ATLAS (+ gromacs, nwchem) on AMD FX 8150 on Debian Testing (Wheezy)

Xianyi's openblas doesn't seem to be ready for AMD FX 8150 yet. I've played with ATLAS in the past, but for some reason didn't see the same performance with NWChem and ATLAS as I saw with NWChem and Openblas, so I never ended up using it.

I'm also having issues using openblas with CPMD and quantum espresso, and ATLAS is a well-established, respectable project, so it's time to give it another shot. As in most cases in these situations, it's probably a matter of PEBKAC.

Building ATLAS
Anyway. On we go...

mkdir /opt/ATLAS
chown ${USER} /opt/ATLAS
mkdir ~/tmp
cd ~/tmp
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
wget http://downloads.sourceforge.net/project/math-atlas/Developer%20%28unstable%29/3.9.72/atlas3.9.72.tar.bz2
tar xvf atlas3.9.72.tar.bz2
cd ATLAS/

Edit ATLAS/Make.top
change the V on line 6 to lowercase i.e. from
- $(ICC) -V 2>&1 >> bin/INSTALL_LOG/ERROR.LOGto
- $(ICC) -v 2>&1 >> bin/INSTALL_LOG/ERROR.LOG
mkdir build/
cd build/

sudo apt-get install cpufreq-utils

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

ondemand

sudo cpufreq-set -g performance

Unfortunately that only takes care of cpu0:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

performance

but

cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor

ondemand

So...since we have 8 cores (cpu0-cpu7):

sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor

OK, we're ready to compile:
.././configure --prefix=/opt/ATLAS -Fa alg '-fPIC' --with-netlib-lapack-tarfile=$HOME/tmp/lapack-3.4.1.tgz --shared

Some of the info that's important is:

OS configured as Linux (1)
Assembly configured as GAS_x8664 (2)
Vector ISA Extension configured as AVXFMA4 (4,496)
Architecture configured as AMDDOZER (34)
Clock rate configured as 3600Mhz

If that checks out you don't need to manually set your architecture. To get a list over options, do
make xprint_enums ; ./xprint_enums

If all is well,

make
make install

You should now be done.

Linking Gromacs against ATLAS

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/ATLAS/lib
#single precision
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/single/lib -L/opt/ATLAS/lib -lsatlas -ltatlas -lgfortran"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/single/include -I/opt/ATLAS/include"
./configure --disable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spatlas --prefix=/opt/gromacs/gromacs-4.5.5
make -j6 2>make.err 1>make.log
make install

#double precision
make distclean
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/double/lib -L/opt/ATLAS/lib -lsatlas -ltatlas -lgfortran"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/double/include -I/opt/ATLAS/include"
./configure --disable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpatlas --prefix=/opt/gromacs/gromacs-4.5.5
make -j6 2>make2.err 1>make2.log
make install

#single + mpi
make distclean
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/single/lib -L/opt/ATLAS/lib -lsatlas -ltatlas -lgfortran"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/single/include -I/opt/ATLAS/include"
./configure --enable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spmpiatlas --prefix=/opt/gromacs/gromacs-4.5.5
make -j6 2>make3.err 1>make3.log
make install

#double + mpi
make distclean
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/double/lib -L/opt/ATLAS/lib -lsatlas -ltatlas -lgfortran"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/double/include -I/opt/ATLAS/include"
./configure --enable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpmpiatlas --prefix=/opt/gromacs/gromacs-4.5.5
make -j6 2>make4.err 1>make4.log
make install

Linking NWChem against ATLAS

export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=`pwd`
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all"
export BLASOPT="-L/opt/ATLAS/lib -lsatlas -ltatlas"
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export LIBRARY_PATH="$LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/ATLAS/lib"
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"
export LDFLAGS="-I/opt/ATLAS/include"
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran 2> make.err 1>make.log
export FC=gfortran
cd $NWCHEM_TOP/contrib
./getmem.nwchem

228. Setting up Asus (nvidia) GF 210 on Debian Testing

NOTE: Unless I remove the legacy driver, *DM will not start. Instead I only get a blank screen with a blinking cursor. See below for solution.

Here's how to get ASUS (nvidia) GF210 up and running in debian testing (wheezy)

First edit /etc/modules, and add
blacklist nouveau

You can either reboot at this point or try
sudo rmmod nouveau

To see whether nouveau got unloaded, do
lsmod |grep nouv

If nothing is returned, then you're good.

Make sure that your card got recognised:

lspci

01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)

I like smxi, so here's how to get the drivers up and running using smxi, which is a fancy shell script.

sudo su
cd /usr/local/bin
wget -Nc smxi.org/smxi.zip
unzip smxi.zip
smxi

The first time you run smxi you have a couple of things to sort out -- lots of little questions to answer. If you don't feel comfortable yet with linux, avoid liquorix since it'll make your debian box deviate more from the standard setup (the liquorix kernel is fine and safe and I've used it in the past before I started rolling my own kernel, but it's more difficult for someone to troubleshoot your system the more it deviates from their own). Other than that most questions aren't that important. I enable non-free immediately after setting up a new box, and while there are sound political reasons for NOT doing it, there are plenty of practical reasons in favour of it.

Anyway, eventually you're done with the setup, and with making sure that your system is up to date. Select Continue to Graphics, then select debian-nvidia

If all goes well you'll get the dkms install of the nvidia driver. You're probably asked whether to generate a new xorg.conf, which you should. You may also get a message about the nvidia driver needing to be added to your xorg.conf.

Once you're done installing the driver, you're asked whether to start your desktop or to quit. While it's fine to start your desktop at this point, why not select quit and check that all went well?

lsmod|grep nvi
nvidia 8028141 0
i2c_core 24002 2 i2c_piix4,nvidia

cat /etc/X11/xorg.conf|grep nvi
Driver "nvidia"
Driver "nvidia"
Driver "nvidia"

Looks fine.

I often have problems with the legacy drivers (blank screen with blinking cursor), so
sudo apt-get purge nvidia-*-legacy-173xx-*

Do
aptitude search 173
to make sure all the legacy drivers are gone.

Purge if there's still something around.

Framebuffer
If possible I like to enable framebuffer (it gives you fancier graphics capabilities in terminal mode e.g. browsing with images using w3m). I've had all manner of headaches doing so with the newer nvidia drivers though, so don't be too surprised if it doesn't pan out.

Edit the following line in your /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet text vga=0x0318 nomodeset"

To see what code to use, look here.
This method is supposed to be deprecated, but I don't have any experience using vbetool.

As for the other options, only 'text' is important -- it will make you boot into the terminal and you will have to start your (default) desktop by doing startx. IF you want to boot into e.g. gdm3, kdm or another *dm, then DO NOT ADD text.

Reboot. To see whether your framebuffer is active do
ls /etc/fb*

/dev/fb0

Pages

19 September 2012

241. pKa, part 3: ccCA in NWChem. Doing something wrong?

18 September 2012

240. Harmonic frequency scaling in NWChem

239. Sun GridEngine: resetting queue status on node

17 September 2012

238. Calculating pKa, part 2: CBS extrapolation basics

16 September 2012

237. Briefly: Packet corrupt during ssh sessions

14 September 2012

236. Calculating pKa, part 1:Example (attempt) of an isodesmic reactions in NWChem

13 September 2012

235. CPMD with Netlib's lapack, blas and your own fftw3 on ROCKS 5.4.3/CentOS 5.6

234. CPMD with netlib lapack, blas and your own fftw on debian testing

12 September 2012

233. Compiling netlib's lapack and blas on Debian Testing (Wheezy)

11 September 2012

232. Compile parallel (threaded) povray 3.7-rc6 on Debian Wheezy

231. Compiling john the ripper: single/serial, parallel/OMP and MPI

10 September 2012

230. ROCKS 5.4.3, ATLAS and Gromacs on Xeon X3480

07 September 2012

229. Compile ATLAS (+ gromacs, nwchem) on AMD FX 8150 on Debian Testing (Wheezy)

228. Setting up Asus (nvidia) GF 210 on Debian Testing

Contributors

Statcounter