31 May 2012

168. Port redirection with ECCE/nwchem

Update 1/6/2012: There may be an alternative, better way of doing this: "Another feature that may or may not be useful to you with this special node that is setup with a higher ulimit for submitting your ECCE jobs is that ECCE has a "hop" feature that lets it go from a main login node on a machine to other nodes before actually running commands (e.g. submitting jobs). If you look at the $ECCE_HOME/siteconfig/CONFIG-Examples/CONFIG.mpp2 file, you'll see this "frontendMachine" directive that is what is used to do this. I'm thinking this might allow you to skip the port redirect options with ssh and just "hop" to your special node from regular login node on the compute host. But, I don't think I'd worry about it if what you have now is working fine."
What I describe below works fine, but it does require that users are able to set up the port redirect themselves. The solution hinted at above would be a better technical solution for a larger group of users with varying technical abilities since it should be enough to copy CONFIG.<> and <>.Q files.
Here's how to use main-subnode hopping: http://verahill.blogspot.com.au/2012/06/ecce-and-inaccessible-cluster-nodes.html

Original post:

If you for some reason can't connect to your computational cluster on port 22, here's how to sort it out.


Setting it up the first time

Adding a node:
I have access to an idle computer off-campus, which sits behind a trusty old linksys router. To access it directly I need to use port forwarding.

Edit  ecce-6.3/apps/siteconfig/remote_shells.site and add

ssh_p9999: ssh -p 9999|scp -P 9999
The miniscule p for ssh and capital P for scp are important. They do the same thing, but the different programmes expect difference cases. 

In the ECCE gateway (the small floating window with icons that start when you start ECCE) select Machine Browser. Under Machine, select Register Machine:


The localhost bit is the key here -- since you'll be doing port redirect you'll formally connect to a port on localhost. Hit Add/Change.
In the main Machine Browser window, highlight oxygen and click on Setup Remote Access. Your ssh_p9999 thingy should show up there. Don't bother testing anything just yet (i.e. Machine status etc.). Since I'm writing this from memory I don't know whether you need to have the port-redirect active at this point or not. If you do, see below under Running.


It really is that simple. See below for how to actually use it.

Adding a remote site:
I work at an Australian university where I appear to be the only person using ECCE. Thus, while ulimit on the local SGE managed cluster is a meagre 32 procs, this hasn't been a problem up till now. However, ECCE launches five procs per job, so using up my procs allocation I've been locked out of the cluster on a regular basis.

As a solution, I've been offered my very own shiny submit node with a heftier 37k procs allowed. The downside is that it's only accessible from the standard submit node. Luckily, it's not much more difficult than doing port redirects for a remote node.


Edit  ecce-6.3/apps/siteconfig/remote_shells.site and add

ssh_p5454: ssh -p 5454scp -P 5454
The miniscule p for ssh and capital P for scp are important. 


In the terminal, run
ecce -admin

Most of the fields above should fairly self-explanatory. A few things to watch out for:

  • ECCE actually looks at the proc to node ratio and will impose strict limitations on the number of cores you can use per node. 50/20 means that if you want to use four cores ECCE forces the use of two nodes. Depending on how you run your jobs (configuration) this may or may not have any real impact. To be safe, pick something like 700 cores and 20 nodes.
  • Path means path. I think ECCE defaults to giving the perl path as /usr/bin/perl, but it should be /usr/bin. Same goes for the qsub path.
  • You need to create a queue. The queue name isn't used anywhere other than in ECCE, so it can be a smart way of setting up defaults. What I'm saying is: it's not that important to get it 'right' since it bears no relation to anything on your cluster.
Click on close.

In 'regular' ecce (i.e. started without -admin) go to the machine browser window, highlight the added site, hit Set up Remote Access, and pick ssh_p5454 as shown below. Don't bother testing anything just yet (e.g. Machine status etc.). Since I'm writing this from memory I don't know whether you need to have the port-redirect active at this point or not. If you do, see below under Running.



As always, setting up a site takes a bit of customisation. Here's my ecce-6.3/apps/siteconfig/CONFIG.gn54 on my ecce workstation.
NWChem: /opt/sw/nwchem-6.1/bin/nwchem
Gaussian-03: /usr/local/bin/G09
perlPath: /usr/bin
qmgrPath: /opt/n1ge62/bin/lx24-amd64
SGE {
#$ -S /bin/csh
#$ -cwd
#$ -l h_rt=$wallTime
#$ -l h_vmem=4G
#$ -j y
}

NWChemEnvironment{
    LD_LIBRARY_PATH /usr/lib/openmpi/1.3.2-gcc/lib/
    PATH /opt/n1ge62/bin/lx24-amd64/
}
NWChemCommand {
#$ -pe mpi_smp$totalprocs  $totalprocs
module load nwchem/6.1
mpirun -n $totalprocs $nwchem $infile > $outfile
}
Gaussian-03Command {
#$ -pe g03_smp4 4
module load gaussian/g09
time G09< $infile > $outfile }
and here's my ecce-6.3/apps/siteconfig/gn54.Q  on my ecce workstation.

# Queue details for gn54
Queues:    nwchem squ8
nwchem|minProcessors:       1
nwchem|maxProcessors:       8
nwchem|runLimit:       4320
nwchem|memLimit:       4000
nwchem|scratchLimit:       0
squ8|minProcessors:       1
squ8|maxProcessors:       6
squ8|runLimit:       4320
squ8|memLimit:       4000
squ8|scratchLimit:       0 
Finally, you need to make sure that  nwchem can find everything - put a file called .nwchemrc in your home folder on the remote node with the correct paths in it, e.g.

nwchem_basis_library /opt/sw/nwchem-6.1/data/libraries/
nwchem_nwpw_library /opt/sw/nwchem-6.1/data/libraryps/
ffield amber
amber_1 /opt/sw/nwchem-6.1/data/amber_s/
amber_2 /opt/sw/nwchem-6.1/data/amber_q/
amber_3 /opt/sw/nwchem-6.1/data/amber_x/
amber_4 /opt/sw/nwchem-6.1/data/amber_u/
spce /opt/sw/nwchem-6.1/data/solvents/spce.rst
charmm_s /opt/sw/nwchem-6.1/data/charmm_s/
charmm_x /opt/sw/nwchem-6.1/data/charmm_x/


That's it.

Running

Starting port redirect:
Before you can pipe anything through to your supersecure remote node/remote site, you need to open a teminal window and do
ssh -C username@remoteserver -L 9999:hiddenserver:22
for the remote node above, or
ssh -C username@remoteserver -L 5454:hiddenserver:22
for the remote site above.

Just to make the syntax clear:
Remote node:
My linksys route is at IP address 110.99.99.99, and the remote node behind it is at 192.168.1.106. My username is verahill
ssh -C verahill@110.99.99.99 -L 9999:192.168.1.106:22

Remote site:
The standard submit node is called msgln4.university.au, and the hidden node is called gn54.university.au. My username is lindqvist
ssh -C lindqvist@msgln4.university.au -L 5454:gn54.university.au:22
You may need to install and use autossh if you keep on being booted off due to inactivity! The syntax is identical.

In ECCE:
Everything in ECCE now works exactly like before - just select the target computer/site/node and go.





What doesn't work:
So far I haven't been able to sort out the whole 'open' remote terminal, which means that tail -f output doesn't work either. I'm leaving that for a rainy day with too much time on my hands.

28 May 2012

167. ECCE/Nwchem on An Australian University computational cluster using qsub with g09/nwchem

EDIT:
I've just learned the First Rule of Remote Computing:
always start by checking the number of concurrent processes you're allowed on the head node, or you can lock yourself out faster that you can say "IT support'.

do
ulimit -u
If it's anywhere under 1000, then you need to be careful.
Default ulimit on ROCKS: 73728
Default ulimit on Debian/Wheezy:  63431
Ulimit on the Oz uni cluster: 32

ECCE launches FIVE processes per job.
Each pipe you add to a command launches another proc. Logging in launches a proc -- if you've reached your quota, you can't log in until a processes finishes.

cat test.text|sed 's/\,/\t/g'|gawk '{print $2,$3,$4}' 
yields three processes -- ten percent of my entire quota.

NOTE:
Running something on a cluster where you have limited access is very different from a cluster you're managing yourself. Apart from knowing the physical layout, you normally have sudo powers on a local cluster.

On potential issue is excessive disk usage -- both in terms of storage space and in terms of raw I/O (writing to an nfs mounted disk is not efficient anyway)
So in order to cut down on that:
1. Define a scratch directory using e.g. (use the correct path)
scratch_dir /scratch
The point being that /scratch is a local directory on the execution node

2. Make sure that you specify
dft
     direct
     ..
end
or even
dft
    noio
    ...
end
to do as little disk caching as possible.

I accidentally ended up storing 52 GB of aoints files from a single job. It may have been what locked me out of the submit node for three hours...

A good way to check your disk-usage is
ls -d * |xargs du -hs

Now, continue reading:



Setting everything up the first time:
First figure out where the mpi libs are:
qsub.tests:

#!/bin/sh
#$ -S /bin/sh
#$ -cwd
#$ -l h_rt=00:14:00
#$ -l h_vmem=4G
#$ -j y
locate libmpi.so
Assuming that the location is /usr/lib/openmpi/1.3.2-gcc/lib/, put 
export LD_LIBRARY_PATH=/usr/lib/openmpi/1.3.2-gcc/lib/
in your ~/.bashrc


Next, look at ls /opt/sw/nwchem-6.1/data -- if there's a default.nwchemrc file, then
ln -s /opt/sw/nwchem-6.1/data/default.nwchemrc ~/.nwchemrc

If not, create ~/.nwchemrc with the locations of the different basis sets, amber files and plane-wave sets listed as follows:

nwchem_basis_library /opt/sw/nwchem-6.1/data/libraries/
nwchem_nwpw_library /opt/sw/nwchem-6.1/data/libraryps/
ffield amber
amber_1 /opt/sw/nwchem-6.1/data/amber_s/
amber_2 /opt/sw/nwchem-6.1/data/amber_q/
amber_3 /opt/sw/nwchem-6.1/data/amber_x/
amber_4 /opt/sw/nwchem-6.1/data/amber_u/
spce /opt/sw/nwchem-6.1/data/solvents/spce.rst
charmm_s /opt/sw/nwchem-6.1/data/charmm_s/
charmm_x /opt/sw/nwchem-6.1/data/charmm_x/


Using nwchem:
A simple qsub file would be:

#!/bin/sh
#$ -S /bin/sh
#$ -cwd
#$ -l h_rt=00:14:00
#$ -l h_vmem=4G
#$ -j y
#$ -pe orte 4
module load nwchem/6.1
time mpirun -n 4 nwchem  test.nw > nwchem.out


with test.nw being the actual nwchem input file which is present in your cwd (current working directory).


Using nwchem with ecce:
This is the proper way of using nwchem. If you haven't already, look here: http://verahill.blogspot.com.au/2012/05/setting-up-ecce-with-qsub-on-australian.html

Then edit your  ecce-6.3/apps/siteconfig/CONFIG.msgln4  file:

NWChem: /opt/sw/nwchem-6.1/bin/nwchem
Gaussian-03: /usr/local/bin/G09
perlPath: /usr/bin/perl
qmgrPath: /usr/bin/qsub

SGE {
#$ -S /bin/csh
#$ -cwd
#$ -l h_rt=$wallTime
#$ -l h_vmem=4G
#$ -j y
}

NWChemFilesToDelete{ core *.aoints.* }

NWChemEnvironment{
    LD_LIBRARY_PATH /usr/lib/openmpi/1.3.2-gcc/lib/
}

NWChemCommand {
#$ -pe mpi_smp4  4
module load nwchem/6.1

mpirun -n $totalprocs $nwchem $infile > $outfile
}

Gaussian-03Command {
#$ -pe g03_smp4 4
module load gaussian/g09

time G09< $infile > $outfile }

Gaussian-03FilesToDelete{ core *.rwf }

Wrapup{
find /scratch/* -name "*" -user $USER |xargs -I {} rm {} -rf
}

And you should be good to go. IMPORTANT: don't copy the settings blindly -- what works at your uni might be different from what works at my uni. But use the above as an inspiration and validation of your thought process. The most important thing to look out for in terms of performance is probably your -pe switch.

Since I'm having problems with the low ulimit, I wrote a small bash script which I've set to run every ten minutes as a cronjob. Of course, if you've used up your 32 procs you can't run the script...also, instead of piping stuff right and left (each pipe creates another fork/proc) I've written it so it dumps stuff to disk. That way you have a list over procs in case you need to kill something manually:

 The script: ~/clean_ps.sh
date 
ps ux>~/.job.list
ps ux|gawk 'END {print NR}'

cat ~/.job.list|grep "\-sh \-i">~/.job2.list
cat ~/.job2.list|gawk '{print$2}'>~/.job3.list
cat ~/.job3.list|xargs -I {} kill -15 {}

cat ~/.job.list|grep "echo">~/.job4.list
cat ~/.job4.list|gawk '{print$2}'>~/.job5.list
cat ~/.job5.list|xargs -I {} kill -15 {}

cat ~/.job.list|grep "notty">~/.job6.list
cat ~/.job6.list|gawk '{print$2}'>~/.job7.list
cat ~/.job7.list|xargs -I {} kill -15 {}

cat ~/.job.list|grep "perl">~/.job8.list
cat ~/.job8.list|gawk '{print$2}'>~/.job9.list
cat ~/.job9.list|xargs -I {} kill -15 {}

qstat -u ${USER} 
ps ux |gawk 'END {print NR}' 
echo "***" 

and the cron job is set up using
crontab -e
 */10 * * * * sh ~/clean_ps.sh>> ~/.cronout

Obviously this kills any job monitoring from the point of view of ecce. However, it keeps you from being locked out. You can manually check the job status using qstat -u ${USER}, then reconnect when a job is ready. Not that convenient, but liveable.

166. Briefly: nvidia API mismatch on debian when running ecce

UPDATE: There's a much better way to do this: "One thing that I did notice is your issues with OpenGL where you suggested moving the shared libraries to another directory. While that's perfectly workable, this would be another instance where consulting the $ECCE_HOME/siteconfig/site_runtime file would be useful. There you would learn about the $ECCE_MESA_OPENGL and $ECCE_MESA_EXCEPT variables that control whether to use the ECCE-supplied GL libraries or native ones (e.g. hardware OpenGL card drivers) on your machine." I'll update this post again when I've had a time to look into it. Lecture slides and grant rejoinders don't write themselves...

Original post:
If you get an error along the lines of this:
 http://www.linuxquestions.org/questions/debian-26/api-mismatch-nvidia-kernel-module-871115/
only when you're running ECCE i.e. there's an API mismatch error with a difference in kernel module version vs the nvidia driver component (in my case 295.49 and 290.10, respectively), thenyou may want to have a look in your apps folder before you launch a major investigation, e.g.
ecce-6.3/apps/rhel5-gcc4.1.2-m64/3rdparty/mesa/liblibGL.so    libGL.so.295.49  libGLU.so.1           libnvidia-glcore.so.295.49
libGL.so.1  libGLU.so        libGLU.so.1.3.071100  libnvidia-tls.so.295.49
You can symmlink the correct drivers, or -- which is even easier -- just move your 3rdparty/mesa library to e.g.3rdparty/bakmesa and see if it solves it