28 May 2012

167. ECCE/Nwchem on An Australian University computational cluster using qsub with g09/nwchem

EDIT:
I've just learned the First Rule of Remote Computing:
always start by checking the number of concurrent processes you're allowed on the head node, or you can lock yourself out faster that you can say "IT support'.

do
ulimit -u
If it's anywhere under 1000, then you need to be careful.
Default ulimit on ROCKS: 73728
Default ulimit on Debian/Wheezy:  63431
Ulimit on the Oz uni cluster: 32

ECCE launches FIVE processes per job.
Each pipe you add to a command launches another proc. Logging in launches a proc -- if you've reached your quota, you can't log in until a processes finishes.

cat test.text|sed 's/\,/\t/g'|gawk '{print $2,$3,$4}' 
yields three processes -- ten percent of my entire quota.

NOTE:
Running something on a cluster where you have limited access is very different from a cluster you're managing yourself. Apart from knowing the physical layout, you normally have sudo powers on a local cluster.

On potential issue is excessive disk usage -- both in terms of storage space and in terms of raw I/O (writing to an nfs mounted disk is not efficient anyway)
So in order to cut down on that:
1. Define a scratch directory using e.g. (use the correct path)
scratch_dir /scratch
The point being that /scratch is a local directory on the execution node

2. Make sure that you specify
dft
     direct
     ..
end
or even
dft
    noio
    ...
end
to do as little disk caching as possible.

I accidentally ended up storing 52 GB of aoints files from a single job. It may have been what locked me out of the submit node for three hours...

A good way to check your disk-usage is
ls -d * |xargs du -hs

Now, continue reading:



Setting everything up the first time:
First figure out where the mpi libs are:
qsub.tests:

#!/bin/sh
#$ -S /bin/sh
#$ -cwd
#$ -l h_rt=00:14:00
#$ -l h_vmem=4G
#$ -j y
locate libmpi.so
Assuming that the location is /usr/lib/openmpi/1.3.2-gcc/lib/, put 
export LD_LIBRARY_PATH=/usr/lib/openmpi/1.3.2-gcc/lib/
in your ~/.bashrc


Next, look at ls /opt/sw/nwchem-6.1/data -- if there's a default.nwchemrc file, then
ln -s /opt/sw/nwchem-6.1/data/default.nwchemrc ~/.nwchemrc

If not, create ~/.nwchemrc with the locations of the different basis sets, amber files and plane-wave sets listed as follows:

nwchem_basis_library /opt/sw/nwchem-6.1/data/libraries/
nwchem_nwpw_library /opt/sw/nwchem-6.1/data/libraryps/
ffield amber
amber_1 /opt/sw/nwchem-6.1/data/amber_s/
amber_2 /opt/sw/nwchem-6.1/data/amber_q/
amber_3 /opt/sw/nwchem-6.1/data/amber_x/
amber_4 /opt/sw/nwchem-6.1/data/amber_u/
spce /opt/sw/nwchem-6.1/data/solvents/spce.rst
charmm_s /opt/sw/nwchem-6.1/data/charmm_s/
charmm_x /opt/sw/nwchem-6.1/data/charmm_x/


Using nwchem:
A simple qsub file would be:

#!/bin/sh
#$ -S /bin/sh
#$ -cwd
#$ -l h_rt=00:14:00
#$ -l h_vmem=4G
#$ -j y
#$ -pe orte 4
module load nwchem/6.1
time mpirun -n 4 nwchem  test.nw > nwchem.out


with test.nw being the actual nwchem input file which is present in your cwd (current working directory).


Using nwchem with ecce:
This is the proper way of using nwchem. If you haven't already, look here: http://verahill.blogspot.com.au/2012/05/setting-up-ecce-with-qsub-on-australian.html

Then edit your  ecce-6.3/apps/siteconfig/CONFIG.msgln4  file:

NWChem: /opt/sw/nwchem-6.1/bin/nwchem
Gaussian-03: /usr/local/bin/G09
perlPath: /usr/bin/perl
qmgrPath: /usr/bin/qsub

SGE {
#$ -S /bin/csh
#$ -cwd
#$ -l h_rt=$wallTime
#$ -l h_vmem=4G
#$ -j y
}

NWChemFilesToDelete{ core *.aoints.* }

NWChemEnvironment{
    LD_LIBRARY_PATH /usr/lib/openmpi/1.3.2-gcc/lib/
}

NWChemCommand {
#$ -pe mpi_smp4  4
module load nwchem/6.1

mpirun -n $totalprocs $nwchem $infile > $outfile
}

Gaussian-03Command {
#$ -pe g03_smp4 4
module load gaussian/g09

time G09< $infile > $outfile }

Gaussian-03FilesToDelete{ core *.rwf }

Wrapup{
find /scratch/* -name "*" -user $USER |xargs -I {} rm {} -rf
}

And you should be good to go. IMPORTANT: don't copy the settings blindly -- what works at your uni might be different from what works at my uni. But use the above as an inspiration and validation of your thought process. The most important thing to look out for in terms of performance is probably your -pe switch.

Since I'm having problems with the low ulimit, I wrote a small bash script which I've set to run every ten minutes as a cronjob. Of course, if you've used up your 32 procs you can't run the script...also, instead of piping stuff right and left (each pipe creates another fork/proc) I've written it so it dumps stuff to disk. That way you have a list over procs in case you need to kill something manually:

 The script: ~/clean_ps.sh
date 
ps ux>~/.job.list
ps ux|gawk 'END {print NR}'

cat ~/.job.list|grep "\-sh \-i">~/.job2.list
cat ~/.job2.list|gawk '{print$2}'>~/.job3.list
cat ~/.job3.list|xargs -I {} kill -15 {}

cat ~/.job.list|grep "echo">~/.job4.list
cat ~/.job4.list|gawk '{print$2}'>~/.job5.list
cat ~/.job5.list|xargs -I {} kill -15 {}

cat ~/.job.list|grep "notty">~/.job6.list
cat ~/.job6.list|gawk '{print$2}'>~/.job7.list
cat ~/.job7.list|xargs -I {} kill -15 {}

cat ~/.job.list|grep "perl">~/.job8.list
cat ~/.job8.list|gawk '{print$2}'>~/.job9.list
cat ~/.job9.list|xargs -I {} kill -15 {}

qstat -u ${USER} 
ps ux |gawk 'END {print NR}' 
echo "***" 

and the cron job is set up using
crontab -e
 */10 * * * * sh ~/clean_ps.sh>> ~/.cronout

Obviously this kills any job monitoring from the point of view of ecce. However, it keeps you from being locked out. You can manually check the job status using qstat -u ${USER}, then reconnect when a job is ready. Not that convenient, but liveable.

166. Briefly: nvidia API mismatch on debian when running ecce

UPDATE: There's a much better way to do this: "One thing that I did notice is your issues with OpenGL where you suggested moving the shared libraries to another directory. While that's perfectly workable, this would be another instance where consulting the $ECCE_HOME/siteconfig/site_runtime file would be useful. There you would learn about the $ECCE_MESA_OPENGL and $ECCE_MESA_EXCEPT variables that control whether to use the ECCE-supplied GL libraries or native ones (e.g. hardware OpenGL card drivers) on your machine." I'll update this post again when I've had a time to look into it. Lecture slides and grant rejoinders don't write themselves...

Original post:
If you get an error along the lines of this:
 http://www.linuxquestions.org/questions/debian-26/api-mismatch-nvidia-kernel-module-871115/
only when you're running ECCE i.e. there's an API mismatch error with a difference in kernel module version vs the nvidia driver component (in my case 295.49 and 290.10, respectively), thenyou may want to have a look in your apps folder before you launch a major investigation, e.g.
ecce-6.3/apps/rhel5-gcc4.1.2-m64/3rdparty/mesa/liblibGL.so    libGL.so.295.49  libGLU.so.1           libnvidia-glcore.so.295.49
libGL.so.1  libGLU.so        libGLU.so.1.3.071100  libnvidia-tls.so.295.49
You can symmlink the correct drivers, or -- which is even easier -- just move your 3rdparty/mesa library to e.g.3rdparty/bakmesa and see if it solves it

27 May 2012

165. Approach to computing reorganisational energies using nwchem

I set out to reproduce Malagoli and Brédas in Chemical Physics Letter, 2000, 327, 13-17 (Link). Essentially it's a paper on calculating reorganisational energies in a few simple organic species, such as biphenyl.  I've already covered this work to some extent (here: http://verahill.blogspot.com.au/2012/05/dft-gridsize-ecce-defaults-to-medium.html ) but here's a step-by-step walkthrough and nwchem:

Approach (using biphenyl):
1. Draw biphenyl -- the dihedral angle between the two rings should be about 38 degrees
2. Optimize structure using b3lyp/6-31+g in gas phase. Gives you E1
3. Change the charge and multiplicity to +1 and 2, respectively.
4. Calculate single-point energy using the previously optimised structure (from step 2, i.e. don't optimise). Gives E2.
5. Now, optimise structure. Gives E3.
6. Change the charge and multiplicity to 0 and 1. Calculate single point energy of the optimised structure in step 5. Gives E4.
7. ΔE=(E2-E3)+(E4-E1). Convert energies from Hartree to eV by multiplying by 27.2107


Using nwchem:
title "Calculating E1"
start neutral_ground_state
geometry
 C     3.58691     -1.70661e-06     0.000280946
 C     2.87790     -0.739281     0.946515
 C     1.48493     -0.738173     0.945097
 C     0.747197     0.00000     0.000278762
 C     1.48488     0.738183     -0.944553
 C     2.87785     0.739273     -0.945948
 C     -0.747197     0.00000     0.000278762
 C     -1.48488     0.00000     -1.19873
 C     -2.87785     0.00000     -1.20050
 C     -1.48493     0.00000     1.19927
 C     -3.58691     0.00000     0.000281534
 C     -2.87790     0.00000     1.20107
 H     4.67272     9.42775e-05     0.000158092
 H     3.40921     -1.32260     1.69312
 H     0.975305     -1.32691     1.69865
 H     3.40931     1.32254     -1.69249
 H     -3.40931     0.00000     -2.14788
 H     -0.975305     0.00000     2.15554
 H     -4.67272     0.00000     0.000125630
 H     -3.40921     0.00000     2.14853
 H     -0.975288     0.00000     -2.15502
 H     0.975288     1.32693     -1.69812
end
charge 0
basis "ao basis" cartesian print
  H library "6-31G**"
  C library "6-31G**"
end
dft
    direct
    grid fine
    xc b3lyp
    mult 1
end
task dft optimize
and
title "Calculating E2"
start cation_excited_state
geometry units angstrom
 C     0.00000     -3.56301     0.00000
 C     -1.13927     -2.85928     -0.393841
 C     -1.13879     -1.46545     -0.394153
 C     0.00000     -0.742814     0.00000
 C     1.13879     -1.46545     0.394153
 C     1.13927     -2.85928     0.393841
 C     0.00000     0.742814     0.00000
 C     1.13879     1.46545     -0.394153
 C     1.13927     2.85928     -0.393841
 C     -1.13879     1.46545     0.394153
 C     0.00000     3.56301     0.00000
 C     -1.13927     2.85928     0.393841
 H     0.00000     -4.64896     0.00000
 H     -2.02827     -3.39662     -0.711607
 H     -2.02148     -0.928265     -0.727933
 H     2.02827     -3.39662     0.711607
 H     2.02827     3.39662     -0.711607
 H     -2.02148     0.928265     0.727933
 H     0.00000     4.64896     0.00000
 H     -2.02827     3.39662     0.711607
 H     2.02148     0.928265     -0.727933
 H     2.02148     -0.928265     0.727933
end
charge 1
basis "ao basis" cartesian print
  H library "6-31G**"
  C library "6-31G**"
end
dft
    direct
    grid fine
    xc b3lyp
    mult 2
end
task dft energy
and
title "Calculating E3"
start cation_ground_state
geometry units angstrom
 C     0.00000     -3.56301     0.00000
 C     -1.13927     -2.85928     -0.393841
 C     -1.13879     -1.46545     -0.394153
 C     0.00000     -0.742814     0.00000
 C     1.13879     -1.46545     0.394153
 C     1.13927     -2.85928     0.393841
 C     0.00000     0.742814     0.00000
 C     1.13879     1.46545     -0.394153
 C     1.13927     2.85928     -0.393841
 C     -1.13879     1.46545     0.394153
 C     0.00000     3.56301     0.00000
 C     -1.13927     2.85928     0.393841
 H     0.00000     -4.64896     0.00000
 H     -2.02827     -3.39662     -0.711607
 H     -2.02148     -0.928265     -0.727933
 H     2.02827     -3.39662     0.711607
 H     2.02827     3.39662     -0.711607
 H     -2.02148     0.928265     0.727933
 H     0.00000     4.64896     0.00000
 H     -2.02827     3.39662     0.711607
 H     2.02148     0.928265     -0.727933
 H     2.02148     -0.928265     0.727933
end
charge 1
basis "ao basis" cartesian print
  H library "6-31G**"
  C library "6-31G**"
end
dft
    direct
    grid fine
    xc b3lyp
    mult 2
end
task dft optimize
and
title "Calculating E4"
start neutral_excited_state
geometry
 C     0.00000     -3.54034     0.00000
 C     -1.20296     -2.84049     -0.216000
 C     -1.20944     -1.46171     -0.206253
 C     0.00000     -0.721866     0.00000
 C     1.20944     -1.46171     0.206253
 C     1.20296     -2.84049     0.216000
 C     0.00000     0.721866     0.00000
 C     1.20944     1.46171     -0.206253
 C     1.20296     2.84049     -0.216000
 C     -1.20944     1.46171     0.206253
 C     0.00000     3.54034     0.00000
 C     -1.20296     2.84049     0.216000
 H     0.00000     -4.62590     0.00000
 H     -2.12200     -3.38761     -0.395378
 H     -2.13673     -0.938003     -0.401924
 H     2.12200     -3.38761     0.395378
 H     2.12200     3.38761     -0.395378
 H     -2.13673     0.938003     0.401924
 H     0.00000     4.62590     0.00000
 H     -2.12200     3.38761     0.395378
 H     2.13673     0.938003     -0.401924
 H     2.13673     -0.938003     0.401924
end
charge 0
basis "ao basis" cartesian print
  H library "6-31G**"
  C library "6-31G**"
end
dft
    direct
    grid fine
    xc b3lyp
    mult 1
end
task dft energy

And it all gives:
E1: Total DFT energy =     -463.321927500065
E2: Total DFT energy =     -463.035336642074
E3: Total DFT energy =     -463.042292962541
E:4 Total DFT energy =     -463.315725187090
ΔE=-463.035336642074-(-463.042292962541)+(-463.315725187090-(-463.321927500065))=0.013158633442Hartree= .3580556270002294 eV ≅ 0.36 eV i.e. the same as the paper.