Lindqvist -- a blog about Linux and Science. Mostly.

18 September 2012

239. Sun GridEngine: resetting queue status on node

I occasionally run into problems with space during a run on my cluster, which causes the job to fail and the node to be marked as unavailable:

qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
eight.q@neon                   BIP   0/0/8          0.45     lx26-amd64    
---------------------------------------------------------------------------------
five.q@boron                   BIP   0/0/5          6.01     lx26-amd64    
---------------------------------------------------------------------------------
six.q@boron                    BIP   0/6/6          6.01     lx26-amd64    
    788 0.75000 submit__la user         r     09/07/2012 18:36:56     6        
---------------------------------------------------------------------------------
two.q@beryllium                BIP   0/0/2          0.24     lx26-amd64    
---------------------------------------------------------------------------------
four.q@tantalum                BIP   0/0/4          0.05     lx26-amd64    E
---------------------------------------------------------------------------------
three.q@beryllium              BIP   0/0/3          0.24     lx26-amd64    
---------------------------------------------------------------------------------
main.q@beryllium               BIP   0/0/1          0.24     lx26-amd64    
---------------------------------------------------------------------------------
main.q@boron                   BIP   0/0/1          6.01     lx26-amd64    
---------------------------------------------------------------------------------
main.q@tantalum                BIP   0/0/1          0.05     lx26-amd64    

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    789 0.67310 zoli.qsub  user         qw    09/09/2012 10:00:35     6        
    802 0.60527 submit__bi user         qw    09/10/2012 20:45:24     6        
    803 0.60525 submit__bi user         qw    09/10/2012 20:46:00     6        
    927 0.25071 submit__ac user         qw    09/18/2012 08:24:00     4        
    928 0.25000 submit__ac user         qw    09/18/2012 08:45:52     4

Before you do anything else, free up space and consider moving your scratch dir to a different/separate disk.

Since I keep forgetting how to reset it, here it is -- as a SGE admin do:
/usr/bin/qmod -c four.q@tantalum

me@beryllium changed state of "four.q@tantalum" (no error)

And now everything is good:

qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
eight.q@neon                   BIP   0/0/8          0.25     lx26-amd64    
---------------------------------------------------------------------------------
five.q@boron                   BIP   0/0/5          5.91     lx26-amd64    
---------------------------------------------------------------------------------
six.q@boron                    BIP   0/6/6          5.91     lx26-amd64    
    788 0.75000 submit__la user         r     09/07/2012 18:36:56     6        
---------------------------------------------------------------------------------
two.q@beryllium                BIP   0/0/2          0.44     lx26-amd64    
---------------------------------------------------------------------------------
four.q@tantalum                BIP   0/4/4          0.17     lx26-amd64    
    927 0.25071 submit__ac user         r     09/18/2012 11:01:26     4        
---------------------------------------------------------------------------------
three.q@beryllium              BIP   0/0/3          0.44     lx26-amd64    
---------------------------------------------------------------------------------
main.q@beryllium               BIP   0/0/1          0.44     lx26-amd64    
---------------------------------------------------------------------------------
main.q@boron                   BIP   0/0/1          5.91     lx26-amd64    
---------------------------------------------------------------------------------
main.q@tantalum                BIP   0/0/1          0.17     lx26-amd64    

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    789 0.67310 zoli.qsub  user         qw    09/09/2012 10:00:35     6        
    802 0.60527 submit__bi user         qw    09/10/2012 20:45:24     6        
    803 0.60525 submit__bi user         qw    09/10/2012 20:46:00     6        
    928 0.25000 submit__ac user         qw    09/18/2012 08:45:52     4

17 September 2012

238. Calculating pKa, part 2: CBS extrapolation basics

I'm typing this up as I'm learning. Be prepared that there may be outright errors, fuzzy thinking and misunderstood concepts in this post. Use it as an inspiration for learning about Complete Basis Set extrapolation -- not as an authoritative guide. What follows is just what I think I've understood. It's a short post since I'll just put it here in order that I can refer to it later.

The general idea
The basic idea behind Complete Basis Set extrapolation seems to be that as you make your basis sets larger and larger by adding Zeta functions to your valence orbitals (making your valence orbitals more 'flexible') the energies you get start to approach those you would get with an infinitely large or complete (and thus 'true') basis set. Exactly how quickly this approach (exponential?) occurs is a matter for debate, so there's a number of ways of extrapolating it.

In practical terms, what seems to be done is:
1. The structure of a molecule is first optimised using a specific level of theory (e.g. MP2/cc-aug-pvdz) in the gas phase. This structure is used for all subsequent calculations.
2. A single point calculation at MP2/aug-cc-pVDZ is done. The electronic energy is recorded.
3. A single point calculation at MP2/aug-cc-pVTZ is done. The electronic energy is recorded.
4. A single point calculation at MP2/aug-cc-pVQZ is done. The electronic energy is recorded.
5. A single point calculation at MP2/aug-cc-pV5Z is done. The electronic energy is recorded.
etc.

where D=double(2), T=triple (3), Q=Quadruple (4)

Using MP2 and the Dunning-Knoll series of correlation consistent basis sets, for formate I got
2 -188.772348824917 (MP2/aug-cc-pVDZ)
3 -188.930859478806 (MP2/aug-cc-pVTZ)
4 -188.984671946038 (MP2/aug-cc-pVQZ)

Which looks like this:

Not everyone will appreciate the default choice of green and red by gnuplot...

The fitted line is MP2energy=1*exp(-0.599028936687644*Zetafunctions)-189.082170648532

Plotting an exponential with three unknown based only on three points is obviously poor form, but the point is fairly clear.

The extrapolated energy is -189.082170648532 hartree. For practical use you will need to obtain enthalpy/entropy corrections and solvation energies using a set method. Also, there are normally corrections for unpaired electrons etc.

The Peterson Scheme:
(Peterson, K. A.; Woon, D. E.; Dunning, T. H., Jr. J. Chem. Phys. 1994, 100, 7410, link)

Instead of a straight A*exp(-B*x)+C fit you do E(x)=A+B*exp(-(x-1))+C*exp(-(x-1)**2), where A is the CBS energy.

So, using gnuplot we can put our energies in 'cbs.dat':

2 -188.772348824917
3 -188.930859478806
4 -188.984671946038

and fit it in gnuplot using:

set xrange [0:5]
f(x)=A+B*exp(-(x-1))+C*exp(-(x-1)**2)
fit f(x) 'cbs.dat' u 1:2 via A,B,C
plot 'cbs.dat' u 1:2, f(x) lc 3

which gives us a CBS energy of -189.016 Hartree (c.f. -189.082 using a simple exponential).

The basic underlying principle behind CBS is thus pretty clear even to someone who's not skilled in the art of computational chemistry.

The difficulties seem to be:
1. What basis set family to use (cc-pVXZ ?)
2. Whether to use diffuse/polarisation/extra orbital functions (aug, p, d+)?
3. What level to do solvation calc on (DFT, MP2; COSMO, CPCM)?
4. What level to do structural optimisation at?
5. What level to do frequency calculation at?

Remember that a great many basis sets haven't been parametrised for elements beyond Ar.

Also, as far as solvation calculations are concerned an issue seems to be that the correct way to calculate solvation energy seems to be using gas phase structures -- and NOT structures optimised using a solvation model. This can cause issues if the gas phase and solution phase conformations are considerably different.

16 September 2012

237. Briefly: Packet corrupt during ssh sessions

Whenever I'm using two subnets for my cluster I seems to be having problems with:

Corrupted MAC on input.
Disconnecting: Packet corrupt

It particularly happens when there's a lot of information being passed to the screen. It's a right killer when you're compiling on a remote system. However, while I've been able to get around that by running a GNU Screen session on the remote box it was time to solve it.\

I googled and found:
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/60764

The subnet I'm having issues with is operation across a switch. One of the computers on the network is defined as the gateway and I tend to have problems when connecting from it to the other computers on the network.
The gateway server has two interfaces, eth0 which is connected to a router which is connected to the outside world, and eth1 which is connected to the local subnet I'm having problems with.

The fix has been as as simple as
sudo apt-get install ethtool
sudo ethtool -K eth1 rx off tx off

I only had to run this on the gateway box and so far I've had no issues. Depending on how you're managing your network interfaces (i.e. wicd, network-manager, /etc/networking/*) you may want to add it to the post-up section of your /etc/network/interfaces:
post-up ethtool -K eth1 rx off tx off

Links to this post:
http://winscp.net/forum/viewtopic.php?t=12469

14 September 2012

236. Calculating pKa, part 1:Example (attempt) of an isodesmic reactions in NWChem

Back to learning about computational approaches to chemistry. The usual warnings apply: why would you trust anything that I say about anything? I'm writing anonymously, and I may misunderstand things at times. So make sure that you compare what I write with that of other sources and make up your own mind.

Anyway, I found a fairly detailed presentation in which they were using Gaussian 98 here: https://www.uow.edu.au/~adamt/Trevitt_Research/Links_files/pKa%20workshop%20slides.pdf

While that should be ok to reproduce, it's not that straightforward to do even with Gaussian, since G09 and G03 don't report solvation parameters in the same way (or detail) as G98.

I'm also a lot keener on NWChem than Gaussian for various reasons, not least that it's 'free' (both libre and gratis) while Gaussian inc. has been accused of doing somewhat unfriendly things in the name of protecting their business interests.

See here and here for an example, and then here for a rebuttal from Gaussian inc. I know that EMSL/Pacific Northwest National Lab that develop NWChem and ECCE are prohibited from using Gaussian since they are considered as being competitors.

Back to science.

Our test example will be acetate, and we'll use formic acid to correct our results.
The fact that this post is very long is due to the amount of detail supplied -- I prefer to show some of the more obvious things so that people can learn from what I post -- and I learn by writing the post.

But first let's just do everything using direct methods.

We work with a thermochemical cycle:

IF we can't calculate the DG_solution directly (i.e. too expensive) we can optimise our structures in the gas phase, and then calculate the solvation energy for those structures.

Then DG_sol=DG_gas+DG_solvation(B)-DGsolvation(A).
(more generally sum of DG_solv(prod) - sum of DG_solv(reactants)).

1. pKa of Acetic acid using direct methods

We can either do
H3CCOOH -> H3COO- + H+
or
H3CCOOH + H2O -> H3COO- + H3O+

Steps:
Optimise acetic acid and acetate in the gas phase and do frequency calculations to get the enthalpy and entropy. Then use the gas phase structures and do single point calculations using COSMO to get the electrostatic solvation energies. Finally, use standard state corrections.

A. Optimise acetic acid in the gas phase and do frequency calculation

Title "aceticacid_gas"
Start  aceticacid_gas
echo
charge 0

geometry autosym units angstrom
 C     0.0402340     0.0308110     0.0402340
 H     -0.600803     -0.611482     0.679201
 H     0.679201     -0.611482     -0.600803
 H     -0.607055     0.659335     -0.607055
 C     0.903928     0.890481     0.903928
 O     0.814831     2.23989     0.814831
 O     1.78275     0.299052     1.78275
 H     2.25438     1.03276     2.25438
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
  default
end

task dft optimize
task dft freq

which gives

         Total DFT energy =     -229.102415550663
      One electron energy =     -550.833155571059
           Coulomb energy =      230.555870626149
    Exchange-Corr. energy =      -29.497734561879
 Nuclear repulsion energy =      120.672603956126

 Numeric. integr. density =       31.999999816803

     Total iterative time =      3.6s

and

 Temperature                      =   298.15K
 frequency scaling parameter      =   1.0000

 Zero-Point correction to Energy  =   38.683 kcal/mol  (  0.061646 au)
 Thermal correction to Energy     =   41.568 kcal/mol  (  0.066243 au)
 Thermal correction to Enthalpy   =   42.160 kcal/mol  (  0.067186 au)

 Total Entropy                    =   69.198 cal/mol-K
   - Translational                =   38.179 cal/mol-K (mol. weight =  60.0211)
   - Rotational                   =   23.855 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    7.164 cal/mol-K

 Cv (constant volume heat capacity) =   14.327 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    8.368 cal/mol-K

So that G=-229.102415550663*(627.503 kcal/Hartree)+(42.160 kcal/mol-298.15*(69.198 cal/molK)/1000)=-1.4372e+05 kcal/mol

B. Optimise acetate in the gas phase and do frequency calculation

Title "acetate_gas"

Start  acetate_gas

echo

charge -1

geometry autosym units angstrom
 C     0.0405721     0.0285481     0.0405721
 H     -0.601438     -0.613690     0.678620
 H     0.678620     -0.613690     -0.601438
 H     -0.605857     0.658809     -0.605857
 C     0.904975     0.886806     0.904975
 O     0.825186     2.23364     0.825186
 O     1.77103     0.316179     1.77103
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken 
end

driver
  default
end

task dft optimize
task dft freq

which gives

         Total DFT energy =     -228.540046314754
      One electron energy =     -539.474204198295
           Coulomb energy =      229.063209931484
    Exchange-Corr. energy =      -29.339040486703
 Nuclear repulsion energy =      111.209988438759

 Numeric. integr. density =       32.000000595562

     Total iterative time =      2.9s

and

 Temperature                      =   298.15K
 frequency scaling parameter      =   1.0000

 Zero-Point correction to Energy  =   30.023 kcal/mol  (  0.047845 au)
 Thermal correction to Energy     =   32.271 kcal/mol  (  0.051427 au)
 Thermal correction to Enthalpy   =   32.863 kcal/mol  (  0.052371 au)

 Total Entropy                    =   64.022 cal/mol-K
   - Translational                =   38.129 cal/mol-K (mol. weight =  59.0133)
   - Rotational                   =   23.766 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    2.127 cal/mol-K

 Cv (constant volume heat capacity) =   11.112 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    5.153 cal/mol-K

so that G=-228.540046314754*(627.503 kcal/Hartree)+(32.271 kcal/mol-298.15*(64.022 cal/molK)/1000)=-1.4338e+05 kcal/mol

Putting A and B together: (-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))=344.54 kcal/mol

We haven't accounted for solvation or the proton yet.

C. Solvation of acetic acid

Title "aceticacid_solvation"
Start  aceticacid_solvation

echo

charge 0

geometry autosym units angstrom
 C     -0.313400     -1.37257     0.00000
 H     -0.932151     -1.56367     -0.882188
 H     -0.932151     -1.56367     0.882188
 H     0.551887     -2.03461     0.00000
 C     0.149260     0.0607660     0.00000
 O     1.30165     0.439500     0.00000
 O     -0.897630     0.927778     0.00000
 H     -0.523912     1.82535     0.00000
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

cosmo
end

task dft energy

which gives

                  COSMO solvation results
                  -----------------------

                 gas phase energy =      -229.1024156124
                 sol phase energy =      -229.1172649438
 (electrostatic) solvation energy =         0.0148493315 (    9.32 kcal/mol)

D. Solvation of acetate

Title "acetate_solvation"
Start  acetate_solvation

echo

charge -1

geometry autosym units angstrom
 C     -0.0308736     -1.36399     0.00000
 H     0.503418     -1.74042     0.882388
 H     0.503418     -1.74042     -0.882388
 H     -1.05531     -1.75261     0.00000
 C     -0.00485953     0.199667     0.00000
 O     -1.12642     0.778065     0.00000
 O     1.14855     0.713544     0.00000
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

cosmo
end

task dft energy

which gives

                  COSMO solvation results
                  -----------------------
  
                 gas phase energy =      -228.5400463128
                 sol phase energy =      -228.6567490452
 (electrostatic) solvation energy =         0.1167027324 (   73.23 kcal/mol)

Putting A, B and solvation energies together:
[(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))]-[73.23-9.32]=344.54-63.910=280.63 kcal/mol

E. The Proton
If you try to do any calculations on an isolated proton you get and SCFE of zero, and you won't do much better in terms of thermochemical data. Yet, monoatomic gases obviously still posses entropy and enthalpy. Instead, the document I cite above uses the ideal gas partition function for ideal monoatomic gases which gives a value of 6.28 kcal/mol for the free energy of a proton. The last reference states that the free energy for a proton in the gas phase is experimentally determined to be -6.28 kcal/mol and that the free energy of hydration is -264.61 kcal/mol (experimental).

[(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-6.28-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))]-[73.23+264.61-9.32]=338.26-328.52=9.74 kcal/mol =40.752 kJ/mol
We're not done yet though -- make sure to continue to 'F. Standard States'

F. Standard States

We know that

DG=DG^0+RT ln (Q).

We have a bit of a problem. We're doing calculations in the gas phase (pressure) but looking at predicting solution values (concentration). Also, I don't fully get it yet, so my explanation is probably a bit fuzzy.

So if A+B-> C+D we get Q=(C*D)/(A*B) but for A -> C+D we get Q=(C*D)/(A) or (1 bar x1 bar/1 bar)= 1 bar = 101350 Pa => nRT/P=1*8.314*298.15/101350=0.024458 m3= 24.46 L. The concentration of each species is 1 mol/bar which in volume terms means 1/24.46 L.

And the A -> C + D situation is what we have if we look at
HA -> H+ + A-

So,
Q=((1/24.46)*(1/24.46)/(1/24.46))=1/2.4.46 which gives
DG=DG^0-RTln(24.46)=DG^0-7924.9 J/mol

Thus, we need to correct for the standard state: 40.752 -7924.9/1000=32.827 kJ/mol

G. Calculating pKa

Since (in solution so that concentrations are 1 M)
DG=DG^0+RTln(K), where K=([H3COO][H+])/([H3COOH])
DG=DG^0+(RT ln([HCOO]/[H3COOH])+RTln (10^(-pH)), where RT ln([HCOO]/[H3COOH])=RTln(1/1)=0 so
DG=DG^0+RTln (10^(-pH))=DG^0+RT log (10^(-pH)/log(e)=DG^0-(RT/log(e)) * pH
Which for equilbrium, where pH=pKa and DG=0, turns into
pKa=DG^0*log(e)/RT

pKa= DG*log(e)/(RT)=(32.827*1000)*log(e)/(8.314*298.15)=5.75

Not that great as predictions go (exp: pKa=4.75). Looking at some of the literature one error lies in the size of the solvation energies. Possibly one should tune the parameters used in the COSMO.

2. Isodemic reaction/correction

Using formic acid
This approach is based on 1) the similarity between two compounds and 2) us knowing the DG_solution parameter for one of them.

Assuming that we know that formic acid has a pKa of 3.75, then DG_solution=pKa*RT/log(e)=3.75*8.314*298.15/log10(e)/1000=21.404 kJ/mol. The reverse reaction is -21.404 kJ/mol.

We skip a few steps.
Here are the calculated parameters for formic acid (using the same method as above):

Formic acid

SCFE: -189.772804709496 Hartree
Enthalpy correction: 23.773 kcal/mol
Entropy correction: 59.339 cal/mol
Solvation energy: 9.99 kcal/mol

Formate
SCFE: -189.217943605798 Hartree
Enthalpy correction: 15.016 kcal/mol
Entropy correction: 56.992 cal/mol
Solvation energy: 72.47 kcal/mol

[Just for kicks we quickly look at what the prediction is:
accounting for everything (solvation, proton etc.)
((-189.217943605798*627.503+(15.016-298.15*56.992/1000)-72.47) +(-6.28-264.61))-(-189.772804709496*627.503+(23.773-298.15*59.339/1000)-9.99)=6.7498 kcal/mol
6.7498*4.184-7924.9/1000=20.316 kJ/mol <=> pKa=1000*20.316*log10(e)/(8.314*298.15)=3.56]

The isodesmic approach:

Here we look at
H3CCOOH + -OOCH -> H3CCOO- +HOOCH

This combined reaction has a
DG_solution=DG_solution(acetic acid/acetate)+DG_solution(formate/formic acid)
<=>
DG_solution(acetic acid/acetate)= DG_solution-DG_solution(formic acid/formate)
=DG_solution-(-21.404 kJ/mol)
=DG_gas+[DG_sol(acetate)+DG_sol(formic acid)-DG_sol(acetic acid)-DG_sol(formate)]+21.404 kJ/mol
=
(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))+(-189.772804709496*627.503+(23.773-298.15*(59.339/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))-(-189.217943605798*627.503+(15.016-298.15*(56.992/1000)))+(-73.23-9.99+9.32+72.47)+5.116 kcal/mol= 8.11 kcal/mol= 33.93 kJ/mol

Here we don't need to fiddle with standard states or experimental values for solvation of the proton.

pKa= DG*log(e)/(RT)=(33.93*1000)*log(e)/(8.314*298.15)=5.94
which is even worse than before...
We want ca 27 kJ/mol = 6.48 kcal/mol. Paradoxically this may be due to the ab initio approach to the pKa of formic acid actually giving a very reasonable value.

Using Propanoic acid
So let's try the isodesmic approach using propanoic acid as our reference instead.

Propanoic acid

SCFE: -268.419515785389 Hartree
Enthalpy correction: 60.894 kcal/mol
Entropy correction: 75.184 cal/mol
Solvation energy: 8.15 kcal/mol

Propanoate
SCFE: -267.857478200414 Hartree
Enthalpy correction: 52.096 kcal/mol
Entropy correction: 75.392 cal/mol
Solvation energy: 72.14 kcal/mol

pKa=4.86 <=> 27.739 kJ/mol= 6.63 kcal/mol

(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))+(-268.419515785389*627.503+(60.894-298.15*(75.184/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))-(-267.857478200414*627.503+(52.096-298.15*(75.392/1000)))+(-73.23-8.15+9.32+72.14)+6.63 kcal/mol=7.4324 kcal/mol=31.097

pKa= DG*log(e)/(RT)=(31.097*1000)*log(e)/(8.314*298.15)=5.45

It's a bit better, but still a bit off.

3. Conclusion:
The isodesmic approach is not magic and it relies on the similarity of two compounds, one for which there are experimental data, causing similar computational issues. Under the right conditions it's a useful approach, whereas under other conditions -- where a body of experimental data exists -- it might just be easier to determine the correlation between experimental and calculated data via fitting.

The approach worked better for the acetate/propanoate pair than the formate/acetate pair -- and one would consider acetic acid and propanoic acid to be more similar than formic acid and the higher acids. We're still far off from obtaining a perfect result though.

An additional problem is obviously the sensitivity of pKa to the DG -- one pH unit is about 1.36 kcal/mol, which is very small given the usual errors in DFT level calculations. I've seen indications online (google!) that the accuracy of b3lyp is about 3 kcal/mol, and one can always debate the accuracy of a highly empirical method like COSMO.

13 September 2012

235. CPMD with Netlib's lapack, blas and your own fftw3 on ROCKS 5.4.3/CentOS 5.6

Update 8 Feb 2013:
I somehow had forgot to include some of the instructions for the BLAS part. Fixed now.

Post:
This is done pretty much like how it's done on Debian (-march=native didn't work in the BLAS compilation though, nor was -fno-whole-file accepted when compiling cpmd)

1. Compile cmake according to this post:
http://verahill.blogspot.com.au/2012/05/compiling-openbabel-231-and-cmake-on.html

2. Compile BLAS
sudo mkdir /share/apps/tools/netlib/blas/lib -p
sudo chown $USER /share/apps/tools/netlib -R

mkdir ~/tmp
cd ~/tmp
wget http://www.netlib.org/blas/blas.tgz
tar xvf blas.tgz
cd BLAS/

Edit make.inc

OPTS = -O3 -shared -m64 -fPIC

make all

gfortran -shared -Wl,-soname,libnetblas.so -o libblas.so.1.0.1 *.o -lc
ln -s libblas.so.1.0.1 libnetblas.so
cp lib*blas* /share/apps/tools/netlib/blas/lib

3. Compile LAPACK
sudo mkdir /share/apps/tools/netlib/lapack -p
sudo chown $USER /share/apps/tools/netlib -R

wget http://www.netlib.org/lapack/lapack-3.4.1.tgz

tar xvf lapack-3.4.1.tgz
cd lapack-3.4.1/
mkdir build
cd build
ccmake ../

Hit 'c' and edit the values:

 BUILD_COMPLEX                    ON
 BUILD_COMPLEX16                  ON
 BUILD_DOUBLE                     ON
 BUILD_SHARED_LIBS                ON
 BUILD_SINGLE                     ON
 BUILD_STATIC_LIBS                ON
 BUILD_TESTING                    ON
 CMAKE_BUILD_TYPE                    
 CMAKE_INSTALL_PREFIX             /share/apps/tools/netlib/lapack
 LAPACKE                          OFF
 LAPACKE_WITH_TMG                 OFF
 USE_OPTIMIZED_BLAS               ON
 USE_XBLAS                        OFF

Hit 'c' again, then hit 'g'.

Edit CMakeCache.txt and add the following lines at the beginning:

########################
# EXTERNAL cache entries
########################
BLAS_FOUND:STRING=TRUE
BLAS_GENERIC_FOUND:BOOL=TRUE
BLAS_GENERIC_blas_LIBRARY:FILEPATH=/share/apps/tools/netlib/blas/lib/libnetblas.so
BLAS_LIBRARIES:PATH=/share/apps/tools/netlib/blas/lib/libnetblas.so

Do
ccmake ../
again, hit 'c', then 'g'.

Now,
make
make install

4. Compile FFTW3

sudo mkdir /share/apps/tools/fftw3
sudo chown $USER /share/apps/tools/fftw3
cd ~/tmp
wget http://www.fftw.org/fftw-3.3.1.tar.gz
tar -xvf fftw-3.3.1.tar.gz
cd fftw-3.3.1
make distclean
./configure --enable-float --enable-mpi --enable-threads --with-pic --prefix=/share/apps/tools/fftw3/single
make
make install
make distclean
./configure --disable-float --enable-mpi --enable-threads --with-pic --prefix=/share/apps/tools/fftw3/double
make 
make install

5. Compile CPMD
I downloaded the cpmd file to a client computer, then uploaded it to the ROCKS front node:
sftp me@rocks:/home/me/tmp

Connected to rocks.
Changing to: /home/me/tmp
sftp> put cpmd-v3_15_3.tar.gz
Uploading cpmd-v3_15_3.tar.gz to /home/me/tmp/cpmd-v3_15_3.tar.gz
cpmd-v3_15_3.tar.gz                100% 2937KB 587.4KB/s   00:05
sftp> exit

I then logged in via ssh as normal.

cd ~/tmp
tar xvf cpmd-v3_15_3.tar.gz
cd CPMD/CONFIGURE

Create a new file LINUX-x86_64-ROCKS

     IRAT=2
     CFLAGS='-c -O2 -Wall'
     CPP='/lib/cpp -P -C -traditional'
     CPPFLAGS='-D__Linux -D__PGI -D__GNU -DFFT_FFTW3 -DPARALLEL -DPOINTER8'
     FFLAGS='-c -O2 -fcray-pointer -fsecond-underscore'
LFLAGS='-L/share/apps/tools/fftw3/double/lib -lfftw3-lfftw3_mpi -lfftw3_threads -I/usr/include -L/share/apps/tools/netlib/blas/lib -lnetblas -L/share/apps/tools/netlib/lapack/lib -llapack -L/opt/openmpi/lib -lpthread -lmpi'
     FFLAGS_GROMOS='  $(FFLAGS)' 
      FC='mpif77 -fbounds-check'
      CC='mpicc'
      LD='mpif77 -fbounds-check'

NOTE: I don't think the -I belongs in the LFLAGS statement, but I'm presuming that I put it there for a reason back when I did it the first time.

Go to ~/tmp/CPMD, and edit wfnio.F (basically replace 3 with 2 and remove 'L'):

 15       CHARACTER(len=*) TAG
 63         IF(TAG(1:2).EQ.'NI') THEN
201       IF(TAG(1:2).NE.'NI') THEN
271         IF(TAG(1:2).EQ.'NI') THEN

Finally, edit Makefile and change

  23 LD = f95 -O

  23 LD = mpif77 -fbounds-check

Time to compile

./mkconfig.sh LINUX-x86_64-ROCKS > Makefile
make
sudo mkdir /share/apps/cpmd
sudo chown $USER /share/apps/cpmd
cp cpmd.x /share/apps/cpmd

echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/tools/netlib/blas/lib:/share/apps/tools/netlib/lapack/lib:/share/apps/tools/fftw3/double/lib' >>~/.bashrc
echo 'export PATH=$PATH:/share/apps/cpmd' >> ~/.bashrc
echo "export PP_LIBRARY_PATH=/share/apps/cpmd/PP_LIBRARY" >>~/.bashrc

You're now done compiling. To test, you need to get some pseudopotential files -- look at e.g. the end of http://verahill.blogspot.com.au/2012/07/not-solved-compiling-cpmd-on-debian.html for instructions.

234. CPMD with netlib lapack, blas and your own fftw on debian testing

This is a minor update to my previous post on CPMD. Back in the days I had issue linking to my Openblas libs (got a binary which would not run properly) but I've since had success with the netlib lapack and blas libs.

1. Compile the netlib lapack and blas libraries according to this post: http://verahill.blogspot.com.au/2012/09/compiling-netlibs-lapack-and-blas-on.html

2. Compile the fftw libraries according to this post (ignore the sections on Openblas and Gromacs):
http://verahill.blogspot.com.au/2012/05/gromacs-with-external-fftw3-and-blas-on.html

3. Compile CPMD. We'll be following this post in large parts.
Register with cpmd.org. Once you're approved download the cpmd source to ~/tmp.

sudo apt-get install libopenmpi-dev openmpi-bin

cd ~/tmp
tar -xvf cpmd-v3_15_3.tar.gz
cd CPMD/CONFIGURE

Create the file LINUX-x86_64-DEBIAN:

   
     IRAT=2
     CFLAGS='-c -O2 -Wall'
     CPP='/lib/cpp -P -C -traditional'
     CPPFLAGS='-D__Linux -D__PGI -D__GNU -DFFT_FFTW3 -DPARALLEL -DPOINTER8'
     FFLAGS='-c -O2 -fcray-pointer -fno-whole-file -fsecond-underscore'
     LFLAGS='-l:/opt/fftw/fftw-3.3.2/double/lib/libfftw3.a -l:/opt/fftw/fftw-3.3.2/double/lib/libfftw3_mpi.a -l:/opt/fftw/fftw-3.3.2/double/lib/libfftw3_threads.a -I/usr/include -l:/opt/netlib/blas/lib/libnetblas.so -l:/opt/netlib/lapack/lib/liblapack.so -lpthread -lmpi'
     FFLAGS_GROMOS='  $(FFLAGS)' 
      FC='mpif77 -fbounds-check'
      CC='mpicc'
      LD='mpif77 -fbounds-check'

Next edit ~/tmp/CPMD/wfnio.F and change the following lines:

 15       CHARACTER(len=*) TAG
 63         IF(TAG(1:2).EQ.'NI') THEN
201       IF(TAG(1:2).NE.'NI') THEN
271         IF(TAG(1:2).EQ.'NI') THEN

Now, in ~/tmp/CPMD, run

./mkconfig.sh LINUX-x86_64-DEBIAN > Makefile
make
sudo mkdir /opt/cpmd
sudo chown $USER /opt/cpmd
cp cpmd.x /opt/cpmd

And follow everything below 'Done! Almost.' in this post: http://verahill.blogspot.com.au/2012/07/not-solved-compiling-cpmd-on-debian.html

12 September 2012

233. Compiling netlib's lapack and blas on Debian Testing (Wheezy)

In addition to specific BLAS/LAPACK libs such as ACML, MKL, and ATLAS netlib provides (what I understand to be) reference versions of BLAS and LAPACK. Presumably these are slower than optimised versions of blas/lapack, but it doesn't hurt being familiar with them.

Here's how to compile those versions.

BLAS

sudo mkdir /opt/netlib
sudo chown $USER /opt/netlib
mkdir /opt/netlib/blas/lib -p
wget http://www.netlib.org/blas/blas.tgz
tar xvf blas.tgz
cd BLAS/

Edit make.inc
OPTS = -O3 -shared -m64 -march=native -fPIC

make all
gfortran -shared -Wl,-soname,libnetblas.so -o libblas.so.1.0.1 *.o -lc
ln -s libblas.so.1.0.1 libnetblas.so
cp lib*blas* /opt/netlib/blas/lib

To see whether everything linked ok:
ldd libnetblas.so

        linux-vdso.so.1 =>  (0x00007ffff1bc6000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002b42ec030000)
        libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002b42ec3b8000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002b42ec6ce000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002b42ec950000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002b42ecb67000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b42ebaf3000)

LAPACK
(inspired by this and this)
mkdir -p /opt/netlib/lapack
sudo apt-get install cmake-curses-gui
cd ~/tmp
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
tar xvf lapack-3.4.1.tgz
cd lapack-3.4.1/
mkdir build
cd build
ccmake ../

Hit 'c' to generate a configuration. Navigate with arrow keys and hit enter to change values. Change to the values in red:

 
 BUILD_COMPLEX                   *ON
 BUILD_COMPLEX16                 *ON
 BUILD_DOUBLE                    *ON
 BUILD_SHARED_LIBS               *ON
 BUILD_SINGLE                    *ON
 BUILD_STATIC_LIBS               *ON
 BUILD_TESTING                   *ON
 CMAKE_BUILD_TYPE                *     
 CMAKE_INSTALL_PREFIX            */opt/netlib/lapack
 LAPACKE                         *OFF
 LAPACKE_WITH_TMG                *OFF
 USE_OPTIMIZED_BLAS              *ON
 USE_XBLAS                       *OFF

Then hit 'c' which might give you (change the values in red) -- I got some errors about ACML/eula here, but don't worry about that.

NOTE: this will only work if you already have blas installed in a standard location. If you don't get the BLAS_FOUND etc. then you should hit 'c' again and then 'g'. Next edit your CMakeCache.txt and paste the variables (without line numbers) you find below this section, then do ccmake ../ again and make sure everything looks ok, and generate using 'g'.

 BLAS_FOUND                       TRUE
 BLAS_GENERIC_FOUND               ON
 BLAS_GENERIC_blas_LIBRARY        /opt/netlib/blas/lib/libnetblas.so
 BLAS_LIBRARIES                   /opt/netlib/blas/lib/libnetblas.so
 BLAS_LINKER_FLAGS
 BUILD_COMPLEX                   *ON
 BUILD_COMPLEX16                 *ON
 BUILD_DOUBLE                    *ON
 BUILD_SHARED_LIBS               *OFF
 BUILD_SINGLE                    *ON
 BUILD_STATIC_LIBS               *ON
 BUILD_TESTING                   *ON
 CMAKE_BUILD_TYPE                *     
 CMAKE_INSTALL_PREFIX            */usr/local 
 LAPACKE                         *OFF
 LAPACKE_WITH_TMG                *OFF
 USE_OPTIMIZED_BLAS              *ON
 USE_XBLAS                       *OFF

The hit 'c' again. If there were no issues, hit 'g' which writes the configuration and exits.

make

[100%] Building Fortran object TESTING/EIG/CMakeFiles/xeigtstz.dir/__/__/INSTALL/dsecnd_INT_ETIME.f.o
Linking Fortran executable ../../bin/xeigtstz
[100%] Built target xeigtstz

make install

Install the project...
-- Install configuration: ""
-- Installing: /opt/netlib/lapack/lib/pkgconfig/lapack.pc
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-config.cmake
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-config-version.cmake
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-targets.cmake
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-targets-noconfig.cmake
-- Installing: /opt/netlib/lapack/lib/liblapack.so
-- Removed runtime path from "/opt/netlib/lapack/lib/liblapack.so"
-- Installing: /opt/netlib/lapack/lib/libtmglib.so
-- Removed runtime path from "/opt/netlib/lapack/lib/libtmglib.so"

tree /opt/netlib/ -d
/opt/netlib/
|-- blas
|   `-- lib
`-- lapack
    `-- lib
        |-- cmake
        |   `-- lapack-3.4.1
        `-- pkgconfig

7 directories

CMakeCache.txt variables:

 16 
 17 BLAS_FOUND:STRING=TRUE
 18 
 19 //Whether not the GENERIC library was found and is usable
 20 BLAS_GENERIC_FOUND:BOOL=TRUE
 21 
 22 //Path to a library.
 23 BLAS_GENERIC_blas_LIBRARY:FILEPATH=/opt/netlib/blas/lib/libnetblas.so
 24 
 25 BLAS_LIBRARIES:PATH=/opt/netlib/blas/lib/libnetblas.so
 26

Testing the libraries:
I built gromacs against the new libs to make sure they 'worked'

sudo mkdir /opt/gromacs
sudo chown ${USER} /opt/gromacs
cd ~/tmp
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-4.5.5.tar.gz
tar xvf gromacs-4.5.5.tar.gz
cd gromacs-4.5.5/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/netlib/blas/lib:/opt/netlib/lapack/lib
export LDFLAGS="-l:/opt/netlib/blas/lib/libnetblas.so -l:/opt/netlib/lapack/lib/liblapack.so"
./configure --disable-mpi --enable-float --with-external-blas --with-external-lapack --program-suffix=_netlib --prefix=/opt/gromacs/gromacs-4.5.5
make

make install

Check that it linked ok:

ldd /opt/gromacs/gromacs-4.5.5/bin/grompp_netlib
        linux-vdso.so.1 =>  (0x00007fffb83f2000)
        libgmxpreprocess.so.6 => /opt/gromacs/gromacs-4.5.5/lib/libgmxpreprocess.so.6 (0x00002b6411cfa000)
        libmd.so.6 => /opt/gromacs/gromacs-4.5.5/lib/libmd.so.6 (0x00002b6411fcd000)
        libfftw3f.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3f.so.3 (0x00002b64123ad000)
        libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00002b64127b0000)
        libgmx.so.6 => /opt/gromacs/gromacs-4.5.5/lib/libgmx.so.6 (0x00002b6412b10000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00002b6412fe5000)
        libnetblas.so => /opt/netlib/blas/lib/libnetblas.so (0x00002b64131e9000)
        liblapack.so => /opt/netlib/lapack/lib/liblapack.so (0x00002b64134cc000)
        libnsl.so.1 => /lib/x86_64-linux-gnu/libnsl.so.1 (0x00002b6413ece000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002b64140e6000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00002b6414369000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002b6414585000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00002b641490c000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00002b6414b24000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b6411ad8000)
        libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002b6414d47000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002b641505d000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002b6415274000)

Here are some input files (it's not a 'real' md run -- I just needed something small and quick to run):
step1.top:

#include "/opt/gromacs/gromacs-4.5.5/share/gromacs/top/ffoplsaa.itp"
#include "/opt/gromacs/gromacs-4.5.5/share/gromacs/top/oplsaa.ff/tip4p.itp"

[system]
test 

[molecules]

step1.mdp:

integrator = md
define      = -DFLEXIBLE
emtol      = 1000.0
emstep     = 0.001
nsteps     = 5000
nstlist    = 1
ns_type    = grid 
rlist      = 0.9
coulombtype= PME  
rcoulomb   = 0.9  
rvdw       = 1.0  
pbc        =  xyz

genbox_netlib -o step1.gro -cs /opt/gromacs/gromacs-4.5.5/share/gromacs/top/tip4p.gro -box 4x4x4 -p step1.top

grompp_netlib -f step1.mdp -po step2.mdp -p step1.top -pp step2.top -c step1.gro -o step2.tpr

mdrun_netlib -v -s step2.tpr -o step3.trr -x step3.xtc -cpo step3.cpt -c step3.gro -e step3.edr -g step3.log

On my old AMD II X3 I got about 7.7 GFLOPS with Openblas and 7.8 GFLOPS with the above libs. Note that the run is shorter than a minute so it's pretty useless for benchmarking. However, there's no obvious MAJOR penalty.

If you don't have cmake:
cp INSTALL/make.inc.gfortran make.inc

Edit make.inc

15 FORTRAN = gfortran
16 OPTS = -O2 -fPIC -m64
17 DRVOPTS = $(OPTS)
18 NOOPT = -O0 -fPIC -m64
19 LOADER = gfortran
20 LOADOPTS =

Edit Makefile

11 #lib: lapacklib tmglib
12 lib: blaslib variants lapacklib tmglib

Run make

make

-->  Tests passed: 13176


   -->   LAPACK TESTING SUMMARY  <--
  Processing LAPACK Testing output found in the TESTING direcory
SUMMARY              nb test run  numerical error    other error  
================    =========== ================= ================  
REAL              1077227  0 (0.000%) 0 (0.000%) 
DOUBLE PRECISION 1078039  0 (0.000%) 0 (0.000%) 
COMPLEX           522814  0 (0.000%) 0 (0.000%) 
COMPLEX16          552410  0 (0.000%) 0 (0.000%) 

--> ALL PRECISIONS 3230490  0 (0.000%) 0 (0.000%)

Older version:
In the oldest version of this post I did the blas compilation by hand:

gfortran -O2 -fPIC -m64 -march=native -funroll-all-loops -c *.f

To build a static library:
ar rvs libblas.a *.o

To build a shared/dynamic library:
gfortran -shared -Wl,-soname,libnetblas.so -o libblas.so.1.0.1 *.o -lc

ldd libblas.so.1.0.1

linux-vdso.so.1 => (0x00007fff301af000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002aeeac390000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002aeeac718000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002aeeaca2e000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002aeeaccb0000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002aeeacec7000)
/lib64/ld-linux-x86-64.so.2 (0x00002aeeabedd000)

Either way:
cp libblas* /opt/netlib/blas/lib

To test:
wget http://www.netlib.org/blas/sblat1
mv sblat1 sblat1.f

And EITHER
gfortran sblat1.f -l:libblas.a

OR
ln -s libblas.so.1.0.1 libnetblas.so
gfortran sblat1.f -l:libnetblas.so

THEN
./a.out

 
Real BLAS Test Program Results
Test of subprogram number  1             SDOT 
                                    ----- PASS -----

 Test of subprogram number  2            SAXPY 
                                    ----- PASS -----

 Test of subprogram number  3            SROTG 
                                    ----- PASS -----

 Test of subprogram number  4             SROT 
                                    ----- PASS -----

 Test of subprogram number  5            SCOPY 
                                    ----- PASS -----

 Test of subprogram number  6            SSWAP 
                                    ----- PASS -----

 Test of subprogram number  7            SNRM2 
                                    ----- PASS -----

 Test of subprogram number  8            SASUM 
                                    ----- PASS -----

 Test of subprogram number  9            SSCAL 
                                    ----- PASS -----

 Test of subprogram number 10            ISAMAX
                                    ----- PASS -----

11 September 2012

232. Compile parallel (threaded) povray 3.7-rc6 on Debian Wheezy

Update 13 May 2013: This build won't work with v3.7-rc7 on debian wheezy if you have libjpeg62 installed. See http://verahill.blogspot.com.au/2013/05/413-povray-37-rc7-on-debian-wheezy.html.

Remove libjpeg62 and it works fine though.

Original post
Expanding my little cluster has got me thinking about additional uses for it. The primary purpose is obviously work i.e. MD simulations using gromacs and ab initio calcs using NWChem and Gaussian. I'm also testing it with John the Ripper to see how well the users of the linux box in the lab are choosing their passwords.

At that point I realised that it'd be sweet to have at least an OMP capable version of povray to speed things up when polishing figures for those elusive journal covers.

Debian testing currently uses v. 3.6.1 of povray but

POV-Ray 3.6 does not support multithreaded rendering. POV-Ray 3.7 does.

So compile we will although v 3.7 is beta, so be aware.

sudo mkdir /opt/povray
sudo chown $USER /opt/povray

wget http://povray.org/redirect/www.povray.org/beta/source/povray-3.7.0.RC6.tar.gz
tar xvf povray-3.7.0.RC6.tar.gz
cd povray-3.7.0.RC6/
sudo apt-get install libboost-all-dev libpng-dev libjpeg-dev libtiff-dev build-essential libsdl-dev

Note: libboost-all-dev is big. It might be enough with libboost-thread-dev

./configure --prefix=/opt/povray --program-suffix=_3.7 COMPILED_BY="me@here"

===============================================================================
POV-Ray 3.7.0.RC5 has been configured.

Built-in features:
  I/O restrictions:          enabled
  X Window display:          disabled
  Supported image formats:   gif tga iff ppm pgm hdr png jpeg tiff
  Unsupported image formats: openexr

Compilation settings:
  Build architecture:  x86_64-unknown-linux-gnu
  Built/Optimized for: x86_64-unknown-linux-gnu (using -march=native)
  Compiler vendor:     gnu
  Compiler version:    g++ 4.7
  Compiler flags:      -pipe -Wno-multichar -Wno-write-strings -fno-enforce-eh-specs -s -O3 -ffast-math -march=native -pthread

Type 'make check' to build the program and run a test render.
Type 'make install' to install POV-Ray on your system.

The POV-Ray components will be installed in the following directories:
  Program (executable):       /opt/povray/bin
  System configuration files: /opt/povray/etc/povray/3.7
  User configuration files:   $HOME/.povray/3.7
  Standard include files:     /opt/povray/share/povray-3.7/include
  Standard INI files:         /opt/povray/share/povray-3.7/ini
  Standard demo scene files:  /opt/povray/share/povray-3.7/scenes
  Documentation (text, HTML): /opt/povray/share/doc/povray-3.7
  Unix man page:              /opt/povray/share/man
===============================================================================

The way it is configured we can keep our debian version of povray and install the newer version (povray_3.7)

make
make install

Seems like -geometry 1000x1000 doesn't work anymore. Instead use -H1000 -W1000

I've played around with it a little bit and it does parallel (threaded) execution nicely.

wget http://www.ms.uky.edu/~lee/visual05/povray/fourcube7.pov
./povray_3.7 -H1000 -W1000 fourcube7.pov +A0.1
takes 9 seconds on an AMD II X3. The standard, serial Debian version takes 21 seconds.

231. Compiling john the ripper: single/serial, parallel/OMP and MPI

Update: updated for v1.7.9-jumbo-7 since hccap2john in 1.7.9-jumbo-6 was broken

For no particular reason at all, here's how to compile John the Ripper on Debian Testing (Wheezy). It's very easy, and this post is probably a bit superfluous. The standard version only supports serial and parallel (OMP). See below for MPI.

The regular version:

mkdir ~/tmp
cd ~/tmp
wget http://www.openwall.com/john/g/john-1.7.9.tar.gz
tar xvf john-1.7.9.tar.gz
cd john-1.7.9/src

If you don't edit the Makefile you build a serial/single-threaded binary.
If you want to build a threaded version for a single node with a multicore processor (OMP) do:
Edit Makefile and uncomment row 19 or 20

18 # gcc with OpenMP
19 OMPFLAGS = -fopenmp
20 OMPFLAGS = -fopenmp -msse2

make clean linux-x86-64

cd ../run

You now have a binary called john in your ../run folder.

The Jumbo version:
If you want to build a distributed version with MPI (can split jobs across several nodes) you need the enhanced, community version:

sudo apt-get install openmpi-bin libopenmpi-dev

cd ~/tmp

wget http://www.openwall.com/john/g/john-1.7.9-jumbo-7.tar.gz

tar xvf john-1.7.9-jumbo-7.tar.gz

cd john-1.7.9-jumbo-7/src

Edit the Makefile

20 ## Uncomment the TWO lines below for MPI (can be used together with OMP as well)

21 ## For experimental MPI_Barrier support, add -DJOHN_MPI_BARRIER too.

22 ## For experimental MPI_Abort support, add -DJOHN_MPI_ABORT too.

23 CC = mpicc -DHAVE_MPI

24 MPIOBJ = john-mpi.o

and do

make clean linux-x86-64-native
cd ../run

I had a look at the passwords on one of our lab boxes -- it immediately discovered that someone had used 'password' as the password...

These test were run on my old AMD II X3 445. Processes which don't speed up with MP are highlighted in red. LM DES is borderline -- it's faster, but doesn't scale well.

Here's the single thread/serial version:
./john --test

Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts: 2906K c/s real, 2918K c/s virtual
Only one salt: 2796K c/s real, 2807K c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts: 95564 c/s real, 95948 c/s virtual
Only one salt: 93593 c/s real, 93781 c/s virtual
Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw: 14094 c/s real, 14122 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw: 918 c/s real, 919 c/s virtual
Benchmarking: Kerberos AFS DES [48/64 4K]... DONE
Short: 474316 c/s real, 475267 c/s virtual
Long: 1350K c/s real, 1356K c/s virtual
Benchmarking: LM DES [128/128 BS SSE2-16]... DONE
Raw: 39843K c/s real, 39923K c/s virtual
Benchmarking: generic crypt(3) [?/64]... DONE
Many salts: 262867 c/s real, 263393 c/s virtual
Only one salt: 260121 c/s real, 260642 c/s virtual
Benchmarking: Tripcode DES [48/64 4K]... DONE
Raw: 369843 c/s real, 370584 c/s virtual
Benchmarking: dummy [N/A]... DONE
Raw: 99512K c/s real, 99712K c/s virtual

Here's the OMP version:

Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts: 6706K c/s real, 2555K c/s virtual
Only one salt: 5015K c/s real, 2091K c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts: 205670 c/s real, 85411 c/s virtual
Only one salt: 238524 c/s real, 86720 c/s virtual
Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw: 38400 c/s real, 13812 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw: 2306 c/s real, 845 c/s virtual
Benchmarking: Kerberos AFS DES [48/64 4K]... DONE
Short: 474675 c/s real, 476581 c/s virtual
Long: 1332K c/s real, 1335K c/s virtual
Benchmarking: LM DES [128/128 BS SSE2-16]... DONE
Raw: 49046K c/s real, 16785K c/s virtual
Benchmarking: generic crypt(3) [?/64]... DONE
Many salts: 721670 c/s real, 246640 c/s virtual
Only one salt: 699168 c/s real, 239605 c/s virtual
Benchmarking: Tripcode DES [48/64 4K]... DONE
Raw: 367444 c/s real, 369657 c/s virtual
Benchmarking: dummy [N/A]... DONE
Raw: 100351K c/s real, 100552K c/s virtual

And here's the MPI version:
mpirun -n 3 ./john --test
(note that this includes a great many more tests than the default version)

Benchmarking: Traditional DES [128/128 BS SSE2-16]... (3xMPI) DONE
Many salts: 8533K c/s real, 8707K c/s virtual
Only one salt: 7705K c/s real, 8110K c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... (3xMPI) DONE
Many salts: 279808 c/s real, 282634 c/s virtual
Only one salt: 273362 c/s real, 276096 c/s virtual
Benchmarking: FreeBSD MD5 [128/128 SSE2 intrinsics 12x]... (3xMPI) DONE
Raw: 65124 c/s real, 65781 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... (3xMPI) DONE
Raw: 2722 c/s real, 2749 c/s virtual
Benchmarking: Kerberos AFS DES [48/64 4K]... (3xMPI) DONE
Short: 1387K c/s real, 1415K c/s virtual
Long: 3880K c/s real, 3959K c/s virtual
Benchmarking: LM DES [128/128 BS SSE2-16]... (3xMPI) DONERaw: 114781K c/s real, 115940K c/s virtual

I don't quite understand the Kerberos results.

Other targets of interest are:

linux-x86-64-avx Linux, x86-64 with AVX (2011+ Intel CPUs)
linux-x86-64-xop Linux, x86-64 with AVX and XOP (2011+ AMD CPUs)
linux-x86-64 Linux, x86-64 with SSE2 (most common)
linux-x86-avx Linux, x86 32-bit with AVX (2011+ Intel CPUs)
linux-x86-xop Linux, x86 32-bit with AVX and XOP (2011+ AMD CPUs)
linux-x86-sse2 Linux, x86 32-bit with SSE2 (most common, if 32-bit)
linux-x86-mmx Linux, x86 32-bit with MMX (for old computers)
linux-x86-any Linux, x86 32-bit (for truly ancient computers)

The FX 8150 does AVX and XOP, while my 1055T doesn't.

The community version has more options:

linux-x86-64-native Linux, x86-64 'native' (all CPU features you've got)
linux-x86-64-gpu Linux, x86-64 'native', CUDA and OpenCL (experimental)
linux-x86-64-opencl Linux, x86-64 'native', OpenCL (experimental)
linux-x86-64-cuda Linux, x86-64 'native', CUDA (experimental)
linux-x86-64-avx Linux, x86-64 with AVX (2011+ Intel CPUs)
linux-x86-64-xop Linux, x86-64 with AVX and XOP (2011+ AMD CPUs)
linux-x86-64[i] Linux, x86-64 with SSE2 (most common)
linux-x86-64-icc Linux, x86-64 compiled with icc
linux-x86-64-clang Linux, x86-64 compiled with clang
linux-x86-gpu Linux, x86 32-bit with SSE2, CUDA and OpenCL (experimental)
linux-x86-opencl Linux, x86 32-bit with SSE2 and OpenCL (experimental)
linux-x86-cuda Linux, x86 32-bit with SSE2 and CUDA (experimental)
linux-x86-sse2[i] Linux, x86 32-bit with SSE2 (most common, 32-bit)
linux-x86-native Linux, x86 32-bit, with all CPU features you've got (not necessarily best)
linux-x86-mmx Linux, x86 32-bit with MMX (for old computers)
linux-x86-any Linux, x86 32-bit (for truly ancient computers)
linux-x86-clang Linux, x86 32-bit with SSE2, compiled with clang
linux-alpha Linux, Alpha
linux-sparc Linux, SPARC 32-bit
linux-ppc32-altivec Linux, PowerPC w/AltiVec (best)
linux-ppc32 Linux, PowerPC 32-bit
linux-ppc64 Linux, PowerPC 64-bit
linux-ia64 Linux, IA-64

10 September 2012

230. ROCKS 5.4.3, ATLAS and Gromacs on Xeon X3480

After doing another round of 'benchmarks' (there are so many factors that differ between the systems that it's difficult to tell exactly what I'm measuring) I'm back to looking at my BLAS/LAPACK.

So here's compiling ATLAS on a cluster made up of six dual-socket mobos with 2x quadcore XeonX3480 CPUs and 8 Gb RAM. The cluster is running ROCKS 5.4.3, which is a spin based on Centos 5.6. We then compile GROMACS using ATLAS and compare it with Openblas. Please note that I am not an expert on optimisations (or computers or anything) so comparing Openblas vs ATLAS won't tell you which one is 'better'. They are just numbers based on what someone once observed on a particular system under a particular set of circumstances.

Hurdles: I first had to deal with the lapack + bad symbols + recompile with -fPIC problem (solved by using netlib lapack and building shared libraries), then encountered the 'libgmx.so: undefined reference to _gfortran_' issue (solved by adding -lgfortran to LDFLAGS).

ATLAS
sudo mkdir /share/apps/ATLAS
sudo chown $USER /share/apps/ATLAS
cd ~/tmp
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
wget http://downloads.sourceforge.net/project/math-atlas/Developer%20%28unstable%29/3.9.72/atlas3.9.72.tar.bz2
tar xvf atlas3.9.72.tar.bz2
cd ATLAS/
mkdir build
cd build
.././configure --prefix=/share/apps/ATLAS -Fa alg '-fPIC' --with-netlib-lapack-tarfile=$HOME/tmp/lapack-3.4.1.tgz --shared

OS configured as Linux (1)
Assembly configured as GAS_x8664 (2)
Vector ISA Extension configured as SSE3 (6,448)
Architecture configured as Corei1 (25)
Clock rate configured as 3059Mhz

make

DONE STAGE 5-1-0 at 15:23
ATLAS install complete. Examine
ATLAS/bin/<arch>/INSTALL_LOG/SUMMARY.LOG for details.

ls lib/

libatlas.a libcblas.a libf77blas.a libf77refblas.a liblapack.a libptcblas.a libptf77blas.a libptlapack.a libsatlas.so libtatlas.so libtstatlas.a Makefile Make.inc

make install

In addition to successful copying you'll also get errors along the lines of

cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libsatlas.dylib': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libtatlas.dylib /share/apps/ATLAS/lib/.
cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libtatlas.dylib': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libsatlas.dll /share/apps/ATLAS/lib/.
cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libsatlas.dll': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libtatlas.dll /share/apps/ATLAS/lib/.
cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libtatlas.dll': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libsatlas.so /share/apps/ATLAS/lib/.
cp /home/me/tmp/ATLAS/build/lib/libtatlas.so /share/apps/ATLAS/lib/.

because those files don't exist.

Gromacs

FFTW3 was first build according to this. The only difference is the install targets (--prefix) -- I put things in /share/apps/gromacs/.fftwsingle and /share/apps/gromacs/.fftwdouble. Gromacs was downloaded and extracted as shown in that post, and /share/apps/gromacs was created.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openmpi/lib:/share/apps/ATLAS/lib
#single precision
export LDFLAGS="-L/share/apps/gromacs/.fftwsingle/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwsingle/include -I/share/apps/ATLAS/include/atlas"
./configure --disable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spa --prefix=/share/apps/gromacs
make -j3
make install
#double precision
make distclean
export LDFLAGS="-L/share/apps/gromacs/.fftwdouble/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwdouble/include -I/share/apps/ATLAS/include/atlas"
./configure --disable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpa --prefix=/share/apps/gromacs
make -j3
make install
#single + mpi
make distclean
export LDFLAGS="-L/share/apps/gromacs/.fftwsingle/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwsingle/include -I/share/apps/ATLAS/include/atlas""
./configure --enable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spampi --prefix=/share/apps/gromacs
make -j3
make install
#double + mpi
make distclean
export LDFLAGS="-L/share/apps/gromacs/.fftwdouble/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwdouble/include -I/share/apps/ATLAS/include/atlas"
./configure --enable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpampi --prefix=/share/apps/gromacs
make -j3
make install

The -lgfortran is IMPORTANT, or you'll end up with
libgmx.so: 'undefined reference to _gfortran_' type errors.

Performance
I ran a 6x6x6 nm box of water for 5 million steps (10 ns) to get a rough idea of the performance.
Make sure to put

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/ATLAS/lib

in your ~/.bashrc, and to include it in your SGE jobs files (if that's what you use).

I allocated 8 Gb RAM and 8 cores for each run.

Double precision:
Openblas: 10.560 ns/day (11.7 GFLOPS, runtime 8182 seconds)
ATLAS : 10.544 ns/day (11.6 GFLOPS, runtime 8194 seconds)

Single precision:
Openblas: 17.297 ns/dat (19.1 GFLOPS, runtime 4995 seconds)
ATLAS: 17.351 ns/day (19.2 GFLOPS, runtime 4980 seconds)
That's 15 seconds difference on a 1h 20 min run. I'd say they are identical.

07 September 2012

229. Compile ATLAS (+ gromacs, nwchem) on AMD FX 8150 on Debian Testing (Wheezy)

Xianyi's openblas doesn't seem to be ready for AMD FX 8150 yet. I've played with ATLAS in the past, but for some reason didn't see the same performance with NWChem and ATLAS as I saw with NWChem and Openblas, so I never ended up using it.

I'm also having issues using openblas with CPMD and quantum espresso, and ATLAS is a well-established, respectable project, so it's time to give it another shot. As in most cases in these situations, it's probably a matter of PEBKAC.

Building ATLAS
Anyway. On we go...

mkdir /opt/ATLAS
chown ${USER} /opt/ATLAS
mkdir ~/tmp
cd ~/tmp
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
wget http://downloads.sourceforge.net/project/math-atlas/Developer%20%28unstable%29/3.9.72/atlas3.9.72.tar.bz2
tar xvf atlas3.9.72.tar.bz2
cd ATLAS/

Edit ATLAS/Make.top
change the V on line 6 to lowercase i.e. from
- $(ICC) -V 2>&1 >> bin/INSTALL_LOG/ERROR.LOGto
- $(ICC) -v 2>&1 >> bin/INSTALL_LOG/ERROR.LOG
mkdir build/
cd build/

sudo apt-get install cpufreq-utils

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

ondemand

sudo cpufreq-set -g performance

Unfortunately that only takes care of cpu0:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

performance

but

cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor

ondemand

So...since we have 8 cores (cpu0-cpu7):

sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor

OK, we're ready to compile:
.././configure --prefix=/opt/ATLAS -Fa alg '-fPIC' --with-netlib-lapack-tarfile=$HOME/tmp/lapack-3.4.1.tgz --shared

Some of the info that's important is:

OS configured as Linux (1)
Assembly configured as GAS_x8664 (2)
Vector ISA Extension configured as AVXFMA4 (4,496)
Architecture configured as AMDDOZER (34)
Clock rate configured as 3600Mhz

If that checks out you don't need to manually set your architecture. To get a list over options, do
make xprint_enums ; ./xprint_enums

If all is well,

make
make install

You should now be done.

Linking Gromacs against ATLAS

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/ATLAS/lib
#single precision
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/single/lib -L/opt/ATLAS/lib -lsatlas -ltatlas -lgfortran"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/single/include -I/opt/ATLAS/include"
./configure --disable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spatlas --prefix=/opt/gromacs/gromacs-4.5.5
make -j6 2>make.err 1>make.log
make install

#double precision
make distclean
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/double/lib -L/opt/ATLAS/lib -lsatlas -ltatlas -lgfortran"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/double/include -I/opt/ATLAS/include"
./configure --disable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpatlas --prefix=/opt/gromacs/gromacs-4.5.5
make -j6 2>make2.err 1>make2.log
make install

#single + mpi
make distclean
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/single/lib -L/opt/ATLAS/lib -lsatlas -ltatlas -lgfortran"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/single/include -I/opt/ATLAS/include"
./configure --enable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spmpiatlas --prefix=/opt/gromacs/gromacs-4.5.5
make -j6 2>make3.err 1>make3.log
make install

#double + mpi
make distclean
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/double/lib -L/opt/ATLAS/lib -lsatlas -ltatlas -lgfortran"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/double/include -I/opt/ATLAS/include"
./configure --enable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpmpiatlas --prefix=/opt/gromacs/gromacs-4.5.5
make -j6 2>make4.err 1>make4.log
make install

Linking NWChem against ATLAS

export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=`pwd`
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all"
export BLASOPT="-L/opt/ATLAS/lib -lsatlas -ltatlas"
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export LIBRARY_PATH="$LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/ATLAS/lib"
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"
export LDFLAGS="-I/opt/ATLAS/include"
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran 2> make.err 1>make.log
export FC=gfortran
cd $NWCHEM_TOP/contrib
./getmem.nwchem

228. Setting up Asus (nvidia) GF 210 on Debian Testing

NOTE: Unless I remove the legacy driver, *DM will not start. Instead I only get a blank screen with a blinking cursor. See below for solution.

Here's how to get ASUS (nvidia) GF210 up and running in debian testing (wheezy)

First edit /etc/modules, and add
blacklist nouveau

You can either reboot at this point or try
sudo rmmod nouveau

To see whether nouveau got unloaded, do
lsmod |grep nouv

If nothing is returned, then you're good.

Make sure that your card got recognised:

lspci

01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)

I like smxi, so here's how to get the drivers up and running using smxi, which is a fancy shell script.

sudo su
cd /usr/local/bin
wget -Nc smxi.org/smxi.zip
unzip smxi.zip
smxi

The first time you run smxi you have a couple of things to sort out -- lots of little questions to answer. If you don't feel comfortable yet with linux, avoid liquorix since it'll make your debian box deviate more from the standard setup (the liquorix kernel is fine and safe and I've used it in the past before I started rolling my own kernel, but it's more difficult for someone to troubleshoot your system the more it deviates from their own). Other than that most questions aren't that important. I enable non-free immediately after setting up a new box, and while there are sound political reasons for NOT doing it, there are plenty of practical reasons in favour of it.

Anyway, eventually you're done with the setup, and with making sure that your system is up to date. Select Continue to Graphics, then select debian-nvidia

If all goes well you'll get the dkms install of the nvidia driver. You're probably asked whether to generate a new xorg.conf, which you should. You may also get a message about the nvidia driver needing to be added to your xorg.conf.

Once you're done installing the driver, you're asked whether to start your desktop or to quit. While it's fine to start your desktop at this point, why not select quit and check that all went well?

lsmod|grep nvi
nvidia 8028141 0
i2c_core 24002 2 i2c_piix4,nvidia

cat /etc/X11/xorg.conf|grep nvi
Driver "nvidia"
Driver "nvidia"
Driver "nvidia"

Looks fine.

I often have problems with the legacy drivers (blank screen with blinking cursor), so
sudo apt-get purge nvidia-*-legacy-173xx-*

Do
aptitude search 173
to make sure all the legacy drivers are gone.

Purge if there's still something around.

Framebuffer
If possible I like to enable framebuffer (it gives you fancier graphics capabilities in terminal mode e.g. browsing with images using w3m). I've had all manner of headaches doing so with the newer nvidia drivers though, so don't be too surprised if it doesn't pan out.

Edit the following line in your /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet text vga=0x0318 nomodeset"

To see what code to use, look here.
This method is supposed to be deprecated, but I don't have any experience using vbetool.

As for the other options, only 'text' is important -- it will make you boot into the terminal and you will have to start your (default) desktop by doing startx. IF you want to boot into e.g. gdm3, kdm or another *dm, then DO NOT ADD text.

Reboot. To see whether your framebuffer is active do
ls /etc/fb*

/dev/fb0

04 September 2012

227. New compute node using AMD FX-8150. Gromacs, nwchem performance/benchmarks

Update: reconfiguring your nwchem binary using getmem.nwchem can speed things up considerably. Most of the runtimes are obtained without using getmem.nwchem and are thus all using the same amount of memory, regardless of what is available. Binaries which have been reconfigured are shown as such.

The short summary: I first wasn't that happy with my choice of the the AMD FX-8150, but after sorting out the ACML libs and getting things benchmarked I'm much more satisfied. The only situation in which I'm not seeing this processor outclass the other systems seems to be that using the Commercial Ab Initio Package, which arrived as a precompiled binary (Portland Fortran).

In general it seems that the FX 8150 is about 10% faster than the i5-2400 for the computations I've tested here -- but beware that the AMD processor is using the machine vendor math libs, while the intel unit is using openblas.

Note that the AMD Phenom II X6 1055T is SLOWER with the ACML libs than with Openblas.

The Lengthy Preamble
I seem to remember promising myself not to get another AMD since, while they may or may not be 'the good guys' (e.g. the Intel/Dell thingy), empirically I keep on seeing my Quadcore Intel i5-2400 3.1 GHz sweeping the floor with my Phenom II X6 1055T 2.8 GHz. Sure, part of the issue is the clock frequency, but the difference seems to be a lot bigger than that.

At any rate, I ended up building a new node for my little cluster. Remember that these are Australian prices. Oh, how I miss you, Newegg -- not just because of the price, but because of the choice.

Luckily it seems like my choice of the FX-8150 has paid off. Also, at the moment of writing the intel i5-2400 and the AMD fx-8150 sell for the same price locally.

The Setup

It's basically an eight-core 3.6 GHz box with 16 GB RAM (expandable to 32 Gb, 4 slots) and a 7200 rpm HDD. I've heard the eight-core FX 8150 uncharitably described as a quad-core with advanced hyper-threading, but I wouldn't be qualified to comment. Interestingly, sinfo registers it as a quad core, while htop and all other programs considers it an 8-core. Finally, looking at this image it looks like the whole 8 core thing is a bit of a cheat -- the whole 4 floating point vs 8 integer processing units.

Gigabyte GA-990FXA-D3 AM3 990FX DDR3 Motherboard AU$ 128 Link
AMD AM3+ x8 FX-8150 3.6Ghz Boxed CPU AU$209 Link
PV316G186C0K 16G Kit(8Gx2) DDR3 1866 AU$ 129 Link
Hitachi 3.5" Desktar 1TB SATA3 HDD 7200rpm AU$83 Link
Corsair GS800 V2 ATX Power Supply Unit AU$ 138 Link
TP-LINK TG-3269 PCI Gigabit PCI Network Card AU$ 8Link
ASUS Vento TA-U11 without PSU AU$99 Link
ASUS 1GB GF210 PCI-E VGA Card Link

NOTE that the mobo does NOT have onboard video. I didn't pick up on that before buying the parts, but luckily had an old ATI card floating around.

The fan on the PSU is a bit annoying. It stays off for the most part (some posts say it should never be completely off, one post said it should be) but starts up in a weird way -- basically the electricity is given in small jolts. Or it's just broken. Other than that it works fine.

Preparation
It's for reasons like these I write this blog. After having installed debian testing I set up NFS, added the box as a node under Sun Grid Engine (Link), set up Gaussian (Link), and compiled Gromacs.

I encountered separate issues trying to compile Openblas (Bulldozer cores aren't supported) and Nwchem with internal libs (odd stuff). I've given up on Openblas and managed to compile nwchem against the AMD ACML. Same went for gromacs -- I eventually recompiled gromacs against ACML. Maybe it's unfair to compare ACML vs Openblas on the i5-2400, but ACML is free, MKL isn't.

Performance -- setup
Note that while I do use NFS it's not in the 'traditional way'. Each node exports a local folder to the front node so that SGE can see it. However, when you run your calcs everything is stored in a local folder, and using a locally compiled version of the number crunching software. In other words, network performance should not affect the benchmarks.

Neon is NOT using openblas, while Boron and Tantalum are. Xianyi's version of openblas won't compile on Bulldozer at the moment (it seems). I will rebuild gromacs with the ACML libs and do the benchmark again.

Also, please note that these 'benchmarks' aren't absolute -- I'm not an expert on optimising performance. You can probably use them to get an idea of the relative computational grunt of the different hardware combinations though.

FX 8150 is a lot more fun with ACML. The Phenom II 1055T is no fun with ACML.

I recompiled nwchem and gromacs on Boron (see below) to see what ACML vs Openblas would be like. I've yet to run those jobs, but will post the results when I have.

Unlike the FX-8150, the Phenom II X6 1055T does not support AVX, FMA3 or FMA4.

Configuration:
Boron (B): Phenom II X6 2.8 GHz, 8Gb RAM (2.8*6=16.8 GFLOPS predicted)
Neon (Ne): FX-8150 X8 3.6 GHz, 16 Gb RAM (3.6*8=28.8 GFLOPS predicted (int), 3.6*4=14.4 GFLOPS (fpu))
Tantalum (Ta): Quadcore i5-2400 3.1 GHz, 8 Gb RAM (3.1*4=12.4 GFLOPS predicted)
Vanadium (V): Dual socket 2x Quadcore Xeon X3480 3.06 GHz, 8Gb. CentOS (ROCKS 5.4.3)/openblas.

Results

Gromacs --double (1 ns 6x6x6 nm tip4p water box; dynamic load balancing, double precision, 500k steps)
B : 10.662 ns/day (11.8 GFLOPS, runtime 8104 seconds)***
B : 9.921 ns/day ( 10.9 GFLOPS, runtime 8709 seconds)**
Ne: 10.606 ns/day (11.7 GFLOPS, runtime 8146 seconds) *
Ne: 12.375 ns/day (13.7 GFLOPS, runtime 6982 seconds)**
Ne: 12.385 ns/day (13.7 GFLOPS, runtime 6976 seconds)****
Ta: 10.825 ns/day (11.9 GFLOPS, runtime 7981 seconds)***
V : 10.560 ns/dat (11.7 GFLOPS, runtime 8182 seconds)***
*no external blas/lapack.
**using ACML libs
*** using openblas
**** using ATLAS

Gromacs --single (1 ns 6x6x6 nm tip4p water box; dynamic load balancing, single precision, 500 k steps)
B : 17.251 ns/day (19.0 GFLOPS, runtime 5008 seconds)***
Ne: 21.874 ns/day (24.2 GFLOPS, runtime 3950 seconds)**
Ne: 21.804 ns/day (24.1 GFLOPS, runtime 3963 seconds)****
Ta: 17.345 ns/day (19.2 GFLOPS, runtime 4982 seconds)***
V : 17.297 ns/day (19.1 GFLOPS, runtime 4995 seconds)***
*no external blas/lapack.
**using ACML libs
*** using openblas
**** using ATLAS

NWChem (opt biphenyl cation, cp-md/pspw):
B : 5951 seconds**
B : 4084 seconds ***
B : 1988 seconds***x
Ne: 3689 seconds**
Ta : 4102 seconds***
V : 5396 seconds***

*no external blas/lapack.
**using ACML libs
*** using openblas
x Reconfigured using getmem.nwchem

NWChem (opt biphenyl cation, geovib, 6-31G**/ub3lyp):
B : 2841 seconds **
B : 2410 seconds***
B : 2101 seconds ***x
Ne: 1665 seconds **
Ta : 1785 seconds***
Ta : 1789 seconds***x

V : 2600 seconds***

*no external blas/lapack.
**using ACML libs
*** using openblas
x Reconfigured using getmem.nwchem

A Certain Commercial Ab Initio Package (Freq calc of pre-optimised H₁₄C₁₉O₃ at 6-31+G*/rb3lyp):
B : 2h 00 min (CPU time 10h 37 min)
Ne: 1h 37 min (CPU time: 11h 13 min)
Ta: 1h 26 min (CPU time: 5h 27 min)
V : 2h 15 min (CPU time 15h 50 min)
Using precompiled binaries.

More:
Since I couldn't use Xianyi's openblas with FX 8150 I downloaded the AMD ACML. I've had issues with that before, which is why I haven't been using that as a rule. This time I was motivated enough to hammer it out though. Anyway, here's the cpuid output from the acml 5.2.0:

./cpuid.exe
Chip manufacturer: AuthenticAMD
AuthenticAMD family 15 extended family 6 model 1
Model Name: AMD FX(tm)-8150 Eight-Core Processor
Chip supports SSE
Chip supports SSE2
Chip supports SSE3
Chip supports AVX
Chip does not support FMA3
Chip supports FMA4

See the other post from today about build nwchem with acml (hint: use the fma4_int64 libs but avoid mp).

Here's 1055T:

Chip manufacturer: AuthenticAMD
AuthenticAMD family 15 extended family 1 model 10
Model Name: AMD Phenom(tm) II X6 1055T Processor
Chip supports SSE
Chip supports SSE2
Chip supports SSE3
Chip does not support AVX
Chip does not support FMA3
Chip does not support FMA4

Issues

Openblas:
You will get SGEMM related errors trying to build openblas according to the instructions I've posted on this site before. Apparently it has to do with the way the architecture is autoselected during build. Or something. I couldn't make it work.

NwChem:
I tried building nwchem with the internal libs, but had no luck. See other posts on this blog for general instructions. Building with the AMD ACML worked fine though.

Files:

NWChem (opt biphenyl cation, cp-md/pspw):

Title "Test 1"
Start biphenyl_cation_twisted-1
echo
charge 1
geometry autosym units angstrom
C 0.00000 -3.54034 0.00000
C -1.20296 -2.84049 -0.216000
C -1.20944 -1.46171 -0.206253
C 0.00000 -0.721866 0.00000
C 1.20944 -1.46171 0.206253
C 1.20296 -2.84049 0.216000
C 0.00000 0.721866 0.00000
C 1.20944 1.46171 -0.206253
C 1.20296 2.84049 -0.216000
C -1.20944 1.46171 0.206253
C 0.00000 3.54034 0.00000
C -1.20296 2.84049 0.216000
H 0.00000 -4.62590 0.00000
H -2.12200 -3.38761 -0.395378
H -2.13673 -0.938003 -0.401924
H 2.12200 -3.38761 0.395378
H 2.12200 3.38761 -0.395378
H -2.13673 0.938003 0.401924
H 0.00000 4.62590 0.00000
H -2.12200 3.38761 0.395378
H 2.13673 0.938003 -0.401924
H 2.13673 -0.938003 0.401924
end
nwpw
simulation_cell
lattice_vectors
2.000000e+01 0.000000e+00 0.000000e+00
0.000000e+00 2.000000e+01 0.000000e+00
0.000000e+00 0.000000e+00 2.000000e+01
end
mult 2
np_dimensions -1 -1
tolerances 1e-7 1e-7
end
driver
default
end
task pspw optimize

NWChem (opt biphenyl cation, geovib, 6-31G**/ub3lyp):

Title "Test 2"
Start biphenyl_cation_twisted
echo
charge 1
geometry autosym units angstrom
C 0.00000 -3.56301 0.00000
C -1.13927 -2.85928 -0.393841
C -1.13879 -1.46545 -0.394153
C 0.00000 -0.742814 0.00000
C 1.13879 -1.46545 0.394153
C 1.13927 -2.85928 0.393841
C 0.00000 0.742814 0.00000
C 1.13879 1.46545 -0.394153
C 1.13927 2.85928 -0.393841
C -1.13879 1.46545 0.394153
C 0.00000 3.56301 0.00000
C -1.13927 2.85928 0.393841
H 0.00000 -4.64896 0.00000
H -2.02827 -3.39662 -0.711607
H -2.02148 -0.928265 -0.727933
H 2.02827 -3.39662 0.711607
H 2.02827 3.39662 -0.711607
H -2.02148 0.928265 0.727933
H 0.00000 4.64896 0.00000
H -2.02827 3.39662 0.711607
H 2.02148 0.928265 -0.727933
H 2.02148 -0.928265 0.727933
end
basis "ao basis" cartesian print
H library "6-31G**"
C library "6-31G**"
END
dft
mult 2
XC b3lyp
mulliken
end
driver
end
task dft optimize
task dft freq numerical

Pages

18 September 2012

239. Sun GridEngine: resetting queue status on node

17 September 2012

238. Calculating pKa, part 2: CBS extrapolation basics

16 September 2012

237. Briefly: Packet corrupt during ssh sessions

14 September 2012

236. Calculating pKa, part 1:Example (attempt) of an isodesmic reactions in NWChem

13 September 2012

235. CPMD with Netlib's lapack, blas and your own fftw3 on ROCKS 5.4.3/CentOS 5.6

234. CPMD with netlib lapack, blas and your own fftw on debian testing

12 September 2012

233. Compiling netlib's lapack and blas on Debian Testing (Wheezy)

11 September 2012

232. Compile parallel (threaded) povray 3.7-rc6 on Debian Wheezy

231. Compiling john the ripper: single/serial, parallel/OMP and MPI

10 September 2012

230. ROCKS 5.4.3, ATLAS and Gromacs on Xeon X3480

07 September 2012

229. Compile ATLAS (+ gromacs, nwchem) on AMD FX 8150 on Debian Testing (Wheezy)

228. Setting up Asus (nvidia) GF 210 on Debian Testing

04 September 2012

227. New compute node using AMD FX-8150. Gromacs, nwchem performance/benchmarks

Contributors

Statcounter