Lindqvist -- a blog about Linux and Science. Mostly.

24 May 2013

432. NWChem 6.3 -- COSMO is now fast(er)!

I probably would've been more excited about this about a year ago when I 'believed' in implicit solvation models (nothing's perfect, and we'll use what is practical so they do fill a strong need. They just aren't very informative for a lot of systems) but it's still a Good Thing.

COSMO has been done using numerical gradients in nwchem 6.1.1 and earlier versions, which has meant that it's been horrendously slow in many cases, in particular if you need to optimise a structure using implicit solvation. COSMO has been -- and still is -- the only implicit solvation model implemented in NWChem, so slow COSMO puts a bit of a spanner in the solvation energy works. Sometimes the calculation even refuses to converge at all.

In contrast, Gaussian has had a number of implicit solvation models implemented, ranging from the quick and dirty PCM, to slower (and better?) C-PCM and I-PCM.

So this is great news.

A quick example:

The test:
Here's a test job (the default cosmo parameters aren't realistic, but this is for testing purposes):


scratch_dir /scratch
start benzene 

geometry units angstroms
C  0.100  1.396  0.000
C  1.209  0.698  0.000
C  1.209 -0.698  0.000
C  0.000 -1.396  0.000
C -1.209 -0.698  0.000
C -1.209  0.698  0.000
H  0.000  2.479  0.000
H  2.147  1.240  0.000
H  2.147 -1.240  0.000
H  0.000 -2.479  0.000
H -2.147 -1.240  0.000
H -2.147  1.240  0.000
end

basis
 H library "6-31+g*" 
 c library "6-31+g*"
end
dft
        direct
end
cosmo
end

scf
        maxiter 999
end

task dft

Note that this is the same test job (plus cosmo, minus optimize) as shown here: http://verahill.blogspot.com.au/2013/05/430-briefly-crude-comparison-of.html

The results:
And here is what I see using nwchem 6.3. (w/ acml 5.3.1, AMD FX 8150/32 gb ram):

6.1.1 19.4 seconds
6.3   14.3 seconds

The difference isn't significant (in the sense that times are too variable so we can't really tell which is faster for such a short job).

But when we change task dft to task dft optimize we get

6.1.1 Fails after 2600 seconds
6.3   128.3 seconds

6.3 churns through the steps pretty efficiently:


@ Step       Energy      Delta E   Gmax     Grms     Xrms     Xmax   Walltime
@ ---- ---------------- -------- -------- -------- -------- -------- --------
@    0    -230.09337488  0.0D+00  0.07376  0.01302  0.00000  0.00000     18.1
@    1    -230.10523734 -1.2D-02  0.00903  0.00231  0.03627  0.10509     45.7
@    2    -230.10619442 -9.6D-04  0.00491  0.00084  0.01898  0.06082     69.1
@    3    -230.10628696 -9.3D-05  0.00176  0.00030  0.00737  0.02428     93.3
@    4    -230.10629787 -1.1D-05  0.00023  0.00005  0.00219  0.00682    115.8
@    5    -230.10629827 -4.0D-07  0.00004  0.00001  0.00047  0.00136    128.2
@    5    -230.10629827 -4.0D-07  0.00004  0.00001  0.00047  0.00136    128.2

while 6.1.1 drags itself along for almost an hour:


@ Step       Energy      Delta E   Gmax     Grms     Xrms     Xmax   Walltime
@ ---- ---------------- -------- -------- -------- -------- -------- --------
@    0    -230.09389924  0.0D+00  0.07389  0.01306  0.00000  0.00000    691.4
@    1    -230.10680306 -1.3D-02  0.01081  0.00197  0.03065  0.10438   1378.3
@    2    -230.10690186 -9.9D-05  0.01000  0.00167  0.00231  0.00803   2092.2

before failing with


6:6:driver: task_gradient failed:: 0
(rank:6 hostname:neon pid:4536):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
 ------------------------------------------------------------------------
 There is an error related to the specified geometry
 ------------------------------------------------------------------------

Sure, the optimization takes 128 seconds instead of ca 44 seconds, but for anyone who's used NWCHEM with COSMO in the past, that's actually not too bad.

I ran another job to get a better feeling for how much longer COSMO vs no COSMO takes for optimization. Optimization of Arecoline (available in ECCE as a fragment) at rb3lyp/6-31+G* takes 2h 5 min with COSMO (33 optimization steps). Without COSMO it takes 37 minutes and uses 14 steps.

431. Briefly: a crude comparison of performance of NWChem 6.1, 6.1.1 and 6.3.

Just a simple comparison of different versions of nwchem on different hardware. It's mostly interesting to myself as a general guide to how slow my nodes are in relative terms.

I built nwchem as shown here: http://verahill.blogspot.com.au/2013/05/424-nwchem-63-on-debian-wheezy.html
and openblas as shown here: http://verahill.blogspot.com.au/2013/05/423-openblas-on-debian-wheezy.html
and installed acml as shown here: http://verahill.blogspot.com.au/2013/05/422-set-up-acml-on-linux.html

I'm using ECCE: http://verahill.blogspot.com.au/2013/01/325-compiling-ecce-64-on-debian-testing.html
and SGE: http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html
I've set up ECCE similarly to what is shown here: http://verahill.blogspot.com.au/2012/06/ecce-in-virtual-machine-step-by-step.html

Test job:

scratch_dir /scratch
Title "opt freq"

Start  biphenyl_cation_twisted

echo

charge 1

geometry autosym units angstrom
 C     0.00000     -3.56301     0.00000
 C     -1.13927     -2.85928     -0.393841
 C     -1.13879     -1.46545     -0.394153
 C     0.00000     -0.742814     0.00000
 C     1.13879     -1.46545     0.394153
 C     1.13927     -2.85928     0.393841
 C     0.00000     0.742814     0.00000
 C     1.13879     1.46545     -0.394153
 C     1.13927     2.85928     -0.393841
 C     -1.13879     1.46545     0.394153
 C     0.00000     3.56301     0.00000
 C     -1.13927     2.85928     0.393841
 H     0.00000     -4.64896     0.00000
 H     -2.02827     -3.39662     -0.711607
 H     -2.02148     -0.928265     -0.727933
 H     2.02827     -3.39662     0.711607
 H     2.02827     3.39662     -0.711607
 H     -2.02148     0.928265     0.727933
 H     0.00000     4.64896     0.00000
 H     -2.02827     3.39662     0.711607
 H     2.02148     0.928265     -0.727933
 H     2.02148     -0.928265     0.727933
end

ecce_print ecce.out

basis "ao basis" cartesian print
  H library "6-31G**"
  C library "6-31G**"
END

dft
  mult 2
  XC b3lyp
  mulliken
end

driver
end

task dft optimize
task dft freq numerical

Results:
The jobs were run using all cores available.

AMD Phenom II X6 1055T, 8 Gb RAM, Openblas. Six cores.

6.1    2461
6.1.1  2114
6.3    2044
6.3    2048**

**using MKL compiled with ifort (http://verahill.blogspot.com.au/2013/07/469-intel-compiler-on-debian.html).

AMD FX 8150, 32 Gb RAM, acml 5.3.1 (gfortran, int64, fma4) -- earlier versions of nwchem were compiled against different versions of acml. Eight cores.

6.1    1619s
6.1.1  1588s
6.3    1611s
6.3    1507s**

**using MKL compiled with ifort (http://verahill.blogspot.com.au/2013/07/469-intel-compiler-on-debian.html).

Intel i5-2400, 16 Gb RAM, openblas. Four cores.

6.1    1689s
6.1.1  1696s
6.3    1652s
6.3    1550s*
6.3    1498s**

*using Intel MKL (see http://verahill.blogspot.com.au/2013/06/465-intel-mkl-math-kernel-library-on.html)
**using MKL compiled with ifort (http://verahill.blogspot.com.au/2013/07/469-intel-compiler-on-debian.html).

AMD Athlon II X3, 4 gb RAM, acm 5.3.1. Three cores.

6.3    4818s 
6.3    4058s**

**using MKL compiled with ifort (http://verahill.blogspot.com.au/2013/07/469-intel-compiler-on-debian.html).

23 May 2013

430. Strange issue with NWChem, openmpi, SGE and ECCE

This one's a bit odd.

Odd in the sense that

the math libs (acml) I'm using should be suitable for the processors that I'm using them for.
it only happens when I submit with ECCE + SGE. Calcs on the input files are fine if I launch the by hand

The problem:
I'm having issues launching jobs on two nodes where the nwchem 6.3. binaries were compiled against acml 5.3.1 (gfortran, int64). I'm launching the jobs from ECCE and I've got SGE set up and working since a long time. My two other nodes, one i5-2400 linked against openblas, and one AMD FX 8150 linked against acml 5.3.1 (gfortran, fma4, int64) work absolutely fine.

Both binaries were linked with acml using

export BLASOPT="-L/opt/acml/acml5.3.1/gfortran64_int64/lib -lacml"
export LIBRARY_PATH="$LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/acml/acml5.3.1/gfortran64_int64/lib"

The first node is an AMD phenom II X6 1055, while the second one is an ancient, recently-revived AMD Athlon X2 3800+. The acml util cpuid.exe gives


Chip manufacturer: AuthenticAMD
AuthenticAMD family 15 extended family 1 model 10
Model Name: AMD Phenom(tm) II X6 1055T Processor
Chip supports SSE
Chip supports SSE2
Chip supports SSE3
Chip does not support AVX
Chip does not support FMA3
Chip does not support FMA4

and


Model Name: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
Chip supports SSE
Chip supports SSE2
Chip supports SSE3
Chip does not support AVX
Chip does not support FMA3
Chip does not support FMA4

respectively. On the AMD Phenom II X6 1055T I kept getting


Scaling coordinates for geometry "geometry" by  1.889725989
 (inverse scale =  0.529177249)

0:Illegal Instruction error, status=: 4
(rank:0 hostname:boron pid:12386):ARMCI DASSERT fail. ../../ga-5-2/armci/src/   common/signaltrap.c:SigIllHandler():276 cond:0

. On the Athlon 64 X2 3800+ the job would just exit at


Directory information
           ---------------------

  0 permanent = .
  0 scratch   = /home/me/scratch

There would be no other errors (in e.g. .po or .o files).

If I launch the job by hand, e.g.

mpirun -n 6 nwchem nwch.nw

it works fine.

The Partial solution
The errors for the AMD Phenom II X6 1055T went away when I instead of acml used openblas:

export BLASOPT="-L/opt/openblas/lib -lopenblas"
export LIBRARY_PATH="$LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/openblas/lib"

See e.g. http://verahill.blogspot.com.au/2013/05/424-nwchem-63-on-debian-wheezy.html for general compilation instructions.

The odd thing:
With openblas the AMD Athlon X2 3800+ suddenly gives


Scaling coordinates for geometry "geometry" by  1.889725989
 (inverse scale =  0.529177249)

0:Illegal Instruction error, status=: 4
(rank:0 hostname:beryllium pid:9267):ARMCI DASSERT fail. ../../ga-5-2/armci/src/common/signaltrap.c:SigIllHandler():276 cond:0

Pages

24 May 2013

432. NWChem 6.3 -- COSMO is now fast(er)!

431. Briefly: a crude comparison of performance of NWChem 6.1, 6.1.1 and 6.3.

23 May 2013

430. Strange issue with NWChem, openmpi, SGE and ECCE

Contributors

Statcounter