Showing posts with label SGE. Show all posts
Showing posts with label SGE. Show all posts

09 April 2014

570. Briefly: restarting a g09 frequency job with SGE, using same queue

I've had g09 frequency jobs die on me, and in g09 analytical frequency jobs can only be restarted using the .rwf. Because the .rwf files are 160 gb, I don't want to be copying them back and forth between nodes. It's easier then to simply make sure that the restarted job is run on the same node as the original job.

A good resource for SGE related stuff is http://rous.mit.edu/index.php/SGE_Instructions_and_Tips#Submitting_jobs_to_specific_queues

Either way, first figure out what node the job ran on. Assuming that the job number was 445:
qacct -j 445|grep hostname
hostname compute-0-6.local

Next figure out the PID, as this is used to name the Gau-[PID].rwf file:
grep PID g03.g03out
Entering Link 1 = /share/apps/gaussian/g09/l1.exe PID= 24286.

You can now craft your restart file, g09_freq.restart -- you'll need to make sure that the paths are appropriate for your system:
%nprocshared=8 %Mem=900000000 %rwf=/scratch/Gau-24286.rwf %Chk=/home/me/jobs/testing/delta_631gplusstar-freq/delta_631gplusstar-freq.chk #P restart
(having empty lines at the end of the file is important) and a qsub file, g09_freq.qsub:
#$ -S /bin/sh #$ -cwd #$ -l h_rt=999:30:00 #$ -l h_vmem=8G #$ -j y #$ -pe orte 8 export GAUSS_SCRDIR=/tmp export GAUSS_EXEDIR=/share/apps/gaussian/g09/bsd:/share/apps/gaussian/g09/local:/share/apps/gaussian/g09/extras:/share/apps/gaussian/g09 /share/apps/gaussian/g09/g09 g09_freq.restart > g09_freq.out
Then submit it to the correct queue by doing
qsub -q all.q@compute-0-6.local g09_freq.qsub

The output goes to g09_freq.log. You know if the restart worked properly if it says
Skip MakeAB in pass 1 during restart.
and
Resume CPHF with iteration 214.

Note that restarting analytical frequency jobs in g09 can be a hit and miss affair. Jobs that run out of time are easy to restart, and some jobs that die silently have also been restarted successfully. On the other hand, a job that died because my resource allocations ran out couldn't be restarted i.e. restart started the freq job from scratch. The same happened with one a node of mine that has what seems like a dodgy PSU. Finally, I also couldn't restart jobs that died silently due to allocation all the RAM to g09 without leaving any to the OS (or at least that's the current best theory). It may thus be a good idea to back up the rwf file every now and again, in spite of the unwieldy size.

17 October 2013

520. New node: AMD FX 8350/32 Gb RAM/990 FX

Update 5 Nov 2012: Note that the motherboard doesn't support the CPU and this leads to spontaneous reboots under certain conditions. Make sure to look at the list over supported CPUs for the motherboard you use (in retrospect, obvious -- but as a linux person you get used to ignoring those things since everything's for just OSX or Win).

See here for the troubleshooting thread:
 http://verahill.blogspot.com.au/2013/10/523-random-reboots-troubleshooting-in.html

Also see this thread: http://www.techpowerup.com/forums/showthread.php?t=184061
I'll need to read up on...stuff...but the bottom line seems to be that one would expect issues with this board/cpu combo:

Still only a 4+1 phase board the FX chips pull a bit more power than that can put out comfortably and stable. [..] Those would be your three best to choose from all are the better 8+2 phase designs...
and
my opinion is to stay away from the asus FX ive seen many people asking why their boards are throttling at full load, vrm protection causes voltages to drop at full load when vrms hit a certain temp.

and it seemed that low (CPU) voltages precipitated crashes.

Original post:
So I built a new node at the beginning of October 2013, using the following parts:
  • AMD FX 8350 CPU
  • 4*8 Gb GSkill RAM
  • ASRock 990FX Extreme3 motherboard
  • 1 Tb Seagate Barracuda HDD
  • MSI N210 graphics card
  • ASUS NX1101 Gigabit NIC
  • Corsair GS700 PSU
  • Antec GX700 case
NOTE that I'm having issues with spontaneous reboots during extended periods (days) of heavy load (100% CPU) which do not appear to be associated with faulty RAM, so you might want to think twice before using the exact same permutation of parts as is listed above. Most likely there's a power issue -- either the PSU isn't supplying enough juice to the Mobo, or the Mobo isn't supplying enough power to the CPU. Note also that the CPU isn't listed as an officially supported CPU by the motherboard manufacturer.*

See here for my troubleshooting thread: http://verahill.blogspot.com.au/2013/10/523-random-reboots-troubleshooting-in.html

So the main value of this post are the photos which show how easy it is to build a computer. Just don't...well...build this particular computer -- use a different motherboard.

The other value of the post is purely personal -- I just wanted to write down the steps to take whenever installing a new node in my little cluster.

There will be a post later on troubleshooting (and hopefully fixing) the issue of the spontaneous reboots.

*I've built a number of computers for myself as well as for other people and haven't had any issues (other than bad RAM) before. I got lazy this time and am paying for it.

The first step was assembly:

I like the case -- it's metal and feels robust. Having two fans on top is a definite plus as well as it works well with my home-built rack.

Note that the case doesn't come with a printed manual -- to get the manual you need to go online. And it still falls short -- there's no guide as to how to use the many, many cables it comes with. However, it's not rocket science either. Turns out that the case has a molex plug for powering the case fans. So plug in the molex plug to the PSU, then plug in the fans to the four weird plug/cables that the case comes with. Note that the mobo has no plug for the internal connector USB 3 cable that the case comes with.

See here for more details re the case: http://www.hardocp.com/article/2013/09/12/antec_gx700_atx_computer_case_review
Case, closed

The glorious innards of The Case.

The 700GX does not come with a mobo panel -- not that they tend to be useful anyway

Luckily all (most?) mobos come with their own panels -- push it in place before doing anything else. It can need a bit of negotiation in order to snap in properly.

The case came with riser nuts, four PSU screws and lots of screws for the mobo.

Put the riser nuts in the case -- they are the golden thingies

And here's the mother board. I don't know if there are universally accepted recommendations, but I prefer to install the CPU and RAM onto the mother board before installing it in the case -- you have more space to manouver and the risk of breaking the board is smaller.

The heatsink (left) and CPU (right)

The heatsink comes with thermal paste pre-applied. Don't touch it -- you want it to be as smooth and even as possible.

Get the CPU out

Note the yellow triangle in the bottom right corner in the picture

That should match up with the triangle in the bottom left of this picture. Note the raised level on the right side of the CPU socket.

The CPU in place. Note the raised lever. There should be no pushing -- the CPU should fit perfectly without any force whatsoever. If you bend a pin...then good luck.

The lever is in the locked position.

Next put the heatsink on. Give this a bit of thought as you won't want to have to reseat it several times (in the worst case you'll have to go buy some thermal paste, clean the heatsink and reapply the paste). So make sure you line up the fasteners before pushing the heatsink in place.

Everything is locked down.

The motherboard with the processor and heatsink in place. In the picture the CPU fan is attached to the WRONG connector. Look for a connector saying 'CPU FAN' (in the picture it's attached to POWER FAN)

Open the RAM slot fasteners, and push the RAM sticks in place firmly, but without excessive violence. Once the fasteners snap shut by themselves the sticks are properly seated. Improperly seated RAM sticks tend to prevent you from booting and leads to a lot of noise.

All four RAM sticks in place, and the motherboard attached to the case via seven screws that screw into the riser nuts.

The PSU is in place.

Main power and auxiliary power cables attached.

This particular case has a special tray for the hard drives.

Hard drive in place

SATA data and power cables attached

The other end of the SATA data cable attaches to the motherboard (SATA 1)

After a bit of rewiring. 

PCI NIC and PCI-E graphics cards in place.
And below is a picture of the cluster -- each node is connected to a gigabit WAN (192.168.2.0/24) router  and a gigabit LAN switch (192.168.1.0/24). 8/32 means 8 cores, 32 gb ram. The cluster 'runs' on the LAN. Each of the four nodes in the picture (there are two three-core nodes in addition) are connected to a KVM. Jobs are managed using SGE.

It's questionable whether one can really call it a cluster though since I run each job on a single node for performance reasons. It still attracts attention from visitors to my office though.



Software:
I then installed debian wheezy on it. During the installation I was notified that I might want to consider enabling non-free to get the r8169 and tg3 firmwares

So after enabling non-free in the sources I did:
sudo apt-get install firmware-realtek firmware firmware-linux-nonfree

Didn't seem to change anything though -- everything was working fine before too.

I also installed amd64-microcode which, if I understand things correctly, should obviate the need for some of the full BIOS updates.

Other little housekeeping things:
I first sorted out
INIT: Id "co" respawning too fast: disabled for 5 minutes
as shown here: http://verahill.blogspot.com.au/2012/01/debian-testing-64-wheezy-small-fixes.html

I then installed a few basic thing:
sudo apt-get install vim screen sinfo gawk lm-sensors

and made a ~/.vimrc:
set number set pastetoggle=<f3> nnoremap <f4> :set nonumber!<CR>

And set vim to the default editor in lieu of nano:
sudo update-alternatives --config editor

I edited /etc/default/sinfo to make it use the correct network:
OPTS="${OPTS} --quiet --bcastaddress=192.168.1.255"
I set up 'static' dhcp on the WAN router.

On the node, I then sorted out /etc/network/interfaces  to use dhcp on eth1 and 192.168.1.180 on eth0, and to route everything properly (i.e. local traffic over eth0, and everything else over eth1):

auto lo
iface lo inet loopback

auto eth1
iface eth1 inet dhcp

auto eth0
iface eth0 inet static
address 192.168.1.180
gateway 192.168.1.1
netmask 255.255.255.0

post-up ip route flush all
post-up route add default eth1
post-up route add -net 192.168.1.0 netmask 255.255.255.0 gw 192.168.1.1 eth0

SGE won't work properly unless you edit /etc/hosts:
127.0.0.1       localhost
#127.0.1.1      oxygen
192.168.1.180   oxygen

The way my cluster works is that every node has its own shared folder.
mkdir ~/oxygen
mkdir ~/scratch
chmod 777 ~/oxygen

Export it as shown here: http://verahill.blogspot.com.au/2012/02/debian-testing-wheezy-64-sharing-folder.html
Set up ssh key login in both directions:
ssh-keygen
vim ~/.ssh/authorized_keys

Then add the new node to the cluster: http://verahill.blogspot.com.au/2013/08/501-briefly-adding-new-node-to-sge.html
Build nwchem as shown here: http://verahill.blogspot.com.au/2013/05/424-nwchem-63-on-debian-wheezy.html
Set up gaussian as shown here: http://verahill.blogspot.com.au/2012/05/settiing-up-gaussian-g09-on-debian.html
Fix shmem: http://verahill.blogspot.com.au/2012/10/shmmax-revisited-and-shmall-shmmni.html

Finally, to address this issue regarding corrupt packages during SSH sessions I then added to /etc/rc.local/sbin/ethtool -K eth1 rx off tx off

24 August 2013

501. Briefly: Adding a new node to SGE

I've done this a couple of times by now, and I always forget one step or another. Most of the information is on http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html but here it is in a briefer form:

In the example I've used krypton as the node name, and 192.168.1.180 as the IP.
My front node is called beryllium and has an IP of 192.168.1.1.

0. On the front node
Add the new node name to the front node/queue master

Add execution host
qconf -ae 

which opens a text file in vim

Edited hostname (krypton) but nothing else. Saving returns
added host krypton to exec host list
Add krypton as a submit host
qconf -as krypton
krypton added to submit host list
Doing this before touching the node makes life a little bit easier.

1. Edit /etc/hosts on the node
Leave
127.0.0.1 localhost
but remove
127.0.1.1 krypton
and make sure that it says
192.168.1.180 krypton
instead.

Throw in
192.168.1.1 beryllium
as well.

2. Install SGE on node
sudo apt-get install gridengine-exec gridengine-client

You'll be asked about
Configure automatically: yes Cell name: rupert Master hostname: beryllium
3. Add node to queue and group
I maintain separate queues and groups depending on how many cores each node has. See e.g. http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html for how to create queues and groups.

If they already exits, just do

qconf -aattr hostgroup hostlist krypton @fourcores
qconf -aattr queue slots "[krypton=4]" fourcores.q

to add the new node.

4. Add pe to queue if necessary
Since I have different queues depending on the number of cores of a node, I tend to have to fiddle with this.

See e.g. http://verahill.blogspot.com.au/2012/06/setting-up-sun-grid-engine-with-three.html for how to create pe:s.

If the pe you need is already created, you can do
qconf -mq fourcores.q

and edit pe_list

5. Check
On the front node, do
qhost
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - beryllium lx26-amd64 3 0.16 7.8G 5.3G 14.9G 398.2M boron lx26-amd64 6 6.02 7.6G 1.6G 14.9G 0.0 helium lx26-amd64 2 - 2.0G - 1.9G - lithium lx26-amd64 3 - 3.9G - 0.0 - neon lx26-amd64 8 8.01 31.4G 1.3G 59.6G 0.0 krypton lx26-amd64 4 4.01 15.6G 2.8G 14.9G 0.0

18 March 2013

362. Sun Grid Engine: Limit number of running jobs for a user

I don't like copying a post verbatim, but I also want to make sure that I keep track of what I do on my shared cluster, and that I don't forget how to set user based job limits.

Andi Broka made an excellent post at https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2010-November/049905.html in which two methods were shown (see below).

Resource quota management in general is covered here these days: http://docs.oracle.com/cd/E19957-01/820-0698/gehwk/index.html


1. Number of running jobs: limit for ALL users
Modify the scheduler configuration:
qconf -msconf
2 schedule_interval 0:0:15 3 maxujobs 0 4 queue_sort_method load
Change maxujobs from 0 (unlimited) to a suitable value.

The m in -msconf stands for modify, conf stands for configuration, and the s stands for scheduler.

2. Number of running jobs: limit for a particular user
Modify the resource quota set(s):
qconf -mrqs
1 { 2 name verahill 3 description "specific limitations for user verahill" 4 enabled TRUE 5 limit users verahill to slots=40 6 }

Again, the -m is for modify, and rqs is for resource quota set(s).

Note that slots are the number of processes and not nodes -- 40 slots allows me to use five 8-core nodes.

18 September 2012

239. Sun GridEngine: resetting queue status on node

I occasionally run into problems with space during a run on my cluster, which causes the job to fail and the node to be marked as unavailable:

qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
eight.q@neon                   BIP   0/0/8          0.45     lx26-amd64    
---------------------------------------------------------------------------------
five.q@boron                   BIP   0/0/5          6.01     lx26-amd64    
---------------------------------------------------------------------------------
six.q@boron                    BIP   0/6/6          6.01     lx26-amd64    
    788 0.75000 submit__la user         r     09/07/2012 18:36:56     6        
---------------------------------------------------------------------------------
two.q@beryllium                BIP   0/0/2          0.24     lx26-amd64    
---------------------------------------------------------------------------------
four.q@tantalum                BIP   0/0/4          0.05     lx26-amd64    E
---------------------------------------------------------------------------------
three.q@beryllium              BIP   0/0/3          0.24     lx26-amd64    
---------------------------------------------------------------------------------
main.q@beryllium               BIP   0/0/1          0.24     lx26-amd64    
---------------------------------------------------------------------------------
main.q@boron                   BIP   0/0/1          6.01     lx26-amd64    
---------------------------------------------------------------------------------
main.q@tantalum                BIP   0/0/1          0.05     lx26-amd64    

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    789 0.67310 zoli.qsub  user         qw    09/09/2012 10:00:35     6        
    802 0.60527 submit__bi user         qw    09/10/2012 20:45:24     6        
    803 0.60525 submit__bi user         qw    09/10/2012 20:46:00     6        
    927 0.25071 submit__ac user         qw    09/18/2012 08:24:00     4        
    928 0.25000 submit__ac user         qw    09/18/2012 08:45:52     4  

Before you do anything else, free up space and consider moving your scratch dir to a different/separate disk.

Since I keep forgetting how to reset it, here it is -- as a SGE admin do:
 /usr/bin/qmod -c four.q@tantalum
me@beryllium changed state of "four.q@tantalum" (no error)
And now everything is good:

qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
eight.q@neon                   BIP   0/0/8          0.25     lx26-amd64    
---------------------------------------------------------------------------------
five.q@boron                   BIP   0/0/5          5.91     lx26-amd64    
---------------------------------------------------------------------------------
six.q@boron                    BIP   0/6/6          5.91     lx26-amd64    
    788 0.75000 submit__la user         r     09/07/2012 18:36:56     6        
---------------------------------------------------------------------------------
two.q@beryllium                BIP   0/0/2          0.44     lx26-amd64    
---------------------------------------------------------------------------------
four.q@tantalum                BIP   0/4/4          0.17     lx26-amd64    
    927 0.25071 submit__ac user         r     09/18/2012 11:01:26     4        
---------------------------------------------------------------------------------
three.q@beryllium              BIP   0/0/3          0.44     lx26-amd64    
---------------------------------------------------------------------------------
main.q@beryllium               BIP   0/0/1          0.44     lx26-amd64    
---------------------------------------------------------------------------------
main.q@boron                   BIP   0/0/1          5.91     lx26-amd64    
---------------------------------------------------------------------------------
main.q@tantalum                BIP   0/0/1          0.17     lx26-amd64    

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    789 0.67310 zoli.qsub  user         qw    09/09/2012 10:00:35     6        
    802 0.60527 submit__bi user         qw    09/10/2012 20:45:24     6        
    803 0.60525 submit__bi user         qw    09/10/2012 20:46:00     6        
    928 0.25000 submit__ac user         qw    09/18/2012 08:45:52     4   

31 August 2012

225. Sun GridEngine: commlib error: got select error (Connection refused)


The issue:
On doing qhost I get
error: commlib error: got select error (Connection refused)
error: unable to send message to qmaster using port 6444 on host "beryllium": got send error
Beryllium is my hostnode.

Same thing happens with qstat and any other imaginable SGE command.

The solution:
It's an obvious one -- just restart the services. I mean, it took me twenty minutes to re-remember that, but it should have been obvious. Most of the time, services are managed using scripts in /etc/init.d/ and that's the case here too. So, hanging my head in shame, here's the solution:

ls /etc/init.d/grid*
/etc/init.d/gridengine-exec  /etc/init.d/gridengine-master

sudo service gridengine-master restart

qhost should now slowly be populated

Done.

How I got there:


 ps aux|grep sge
sgeadmin  3173  0.0  0.0  56844  3428 ?        Sl   Aug20   6:29 /usr/lib/gridengine/sge_execd

 tree /var/spool/gridengine -L 4 -d
/var/spool/gridengine
|-- execd
|   `-- beryllium
|       |-- active_jobs
|       |-- jobs
|       `-- job_scripts
|-- qmaster
|   `-- job_scripts
`-- spooldb

Looking at /var/spool/gridengine/execd/beryllium/messages

08/20/2012 10:47:17|  main|beryllium|I|starting up GE 6.2u5 (lx26-amd64)
08/30/2012 15:06:57|  main|beryllium|E|commlib error: got read error (closing "beryllium/qmaster/1")
08/30/2012 15:06:58|  main|beryllium|W|can't register at qmaster "beryllium": abort qmaster registration due to communication errors
less /var/lib/gridengine/rupert/common/act_qmaster
beryllium

So looks ok.

Oddly, there's nothing funny in /tmp -- no execd_messages.* files.

 ps aux|grep sge
sgeadmin  3173  0.0  0.0  56844  3428 ?        Sl   Aug20   6:29 /usr/lib/gridengine/sge_execd
sudo kill 3173


start-stop-daemon --exec /usr/sbin/sge_execd --start --user sgeadmin
Which didn't seem to do anything.

start-stop-daemon --exec /usr/sbin/sge_qmaster --start --user sgeadmin
which doesn't seem to do anything either.

/usr/lib/gridengine/gethostname -aname
critical error: Please set the environment variable SGE_ROOT.
export SGE_ROOT=/var/lib/gridengine
/usr/lib/gridengine/gethostname -aname
beryllium

 service gridengine-master restart
Restarting Sun Grid Engine Master Scheduler: sge_qmasterrm: cannot remove `/var/run/gridengine/qmaster.pid': Permission denied
.
cat /var/run/gridengine/qmaster.pid
3198
ps aux|grep 3198
yields nothing
sudo rm  /var/run/gridengine/qmaster.pid

sudo service gridengine-master restart
Restarting Sun Grid Engine Master Scheduler: sge_qmaster.
ps aux|grep sge
sgeadmin 32178  2.5  0.0  69004  6112 ?        Sl   09:40   0:00 /usr/lib/gridengine/sge_qmaster
qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
    715 0.75000 submit__la me         r     08/22/2012 08:10:32 six.q@boron                        6        
    720 0.25194 submit__63 me         r     08/22/2012 11:15:02 four.q@tantalum                    4        
    716 0.74817 submit__la me         qw    08/22/2012 08:11:28                                    6        
    719 0.70429 submit__la me         qw    08/22/2012 08:38:17                                    6        
    721 0.25071 submit__63 me         qw    08/22/2012 11:15:35                                    4        
    722 0.25000 submit__32 me         qw    08/22/2012 11:16:01                                    4    

 qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
beryllium               lx26-amd64      3     -    7.8G       -   14.9G       -
boron                   lx26-amd64      6  6.10    7.6G    1.4G   14.9G  240.8M
tantalum                lx26-amd64      4  4.01    7.7G    1.6G   14.9G     0.0

sudo service gridengine-exec restart
Restarting Sun Grid Engine Execution Daemon: sge_execd.

qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
beryllium               lx26-amd64      3  0.22    7.8G    3.0G   14.9G  141.3M
boron                   lx26-amd64      6  6.09    7.6G    1.4G   14.9G  240.8M
tantalum                lx26-amd64      4  4.01    7.7G    1.6G   14.9G     0.0




06 June 2012

176. Weaning people onto SGE, one script at a time

On a five node (1 front + 4 exec; each node has 8 cores and 8 GB RAM) cluster that I know and hang out with, people have been submitting jobs one by one. As in, doing it manually, without a queue manager.

I got one of the users to start using my Very Simple Python Queue Manager to prevent too much idle time, but not everyone is using it yet.

Another downside when people aren't using queue managers is that they use top and kill to manage jobs, and that has a way of screwing things up for everyone. SGE is a much better solution in every possible sense.

To make it easier for the users to switch to using qsub i.e. make the change as undisruptive as possible, I wrote a little bash function and set up some standard qsub files.

The user navigates to the  directory where their .in file is (e.g. test.in) and runs
presub test
which open test.in and creates test.qsub

The user then submits by doing
qsub test.qsub


It's easy enough to customize the function and the output files (.e.g using .com, .g03in etc.). This script obviously only does g09, but I'll post a more general script later.




The .bashrc function:
presub () {

    paste -s -d "\n" ~/.qsub/qsub.head $1.in ~/.qsub/qsub.tail > $1.qsub
    return 0
}



The files:
I put the following files in ~/.qsub/

qsub.head:

#$ -S /bin/sh
#$ -cwd
#$ -l h_rt=99:30:00
#$ -l h_vmem=8G
#$ -j y
#$ -pe orte 8



export GAUSS_SCRDIR=/tmp
export GAUSS_EXEDIR=/share/apps/gaussian/g09/bsd:/share/apps/gaussian/g09/local:/share/apps/gaussian/g09/extras:/share/apps/gaussian/g09
/share/apps/gaussian/g09/g09 << END >> g09.log

qsub.tail:





END

The empty lines above are on purpose since gaussian can be annoying in that sense.


05 June 2012

174. Setting up Sun Grid Engine with three nodes on Debian

Firstly, I must acknowledge this guide: http://helms-deep.cable.nu/~rwh/blog/?p=159

I FOLLOW THAT POST ALMOST VERBATIM

This post will be more of a "I followed this guide and it actually works on debian testing/wheezy too and here's how" post, since it doesn't add anything significant to the post above, other than detail.

Since I ran into problems over and over again, I'm posting as much as I can here. Hopefully you can ignore most of the post for this reason.

Some reading before you start:
Having toyed with this for a while I've noticed one important factor in getting this to work:
the hostnames you use when you configure SGE MUST match those returned by hostname. It doesn't matter what you've defined in your /etc/host file. This can obviously cause a little bit of trouble when you've got multiple subnets set up (my computers communicate via a 10/100 net for WAN and a 10/100/1000 net for computations). My front node is called beryllium (i.e. this is what is returned when hostname is executed) but it's known as corella on the gigabit LAN. Same goes for one of my sub nodes: it's called borax on the giganet and boron on the slow LAN. hostname here returns boron. I should obviously go back and redo this for the gigabit subnet later -- I'm just posting  what worked.

While setting it up on the front node takes a little while, the good news is that very little work needs to be done on each node. This would become important when you are working with a large number of nodes -- with the power of xargs and a name list, setting them up on the front node should be a breeze.

My front node is beryllium, and one of my subnodes is boron. I've got key-based, password-less ssh login set up.

Set up your front node before you touch your subnodes. Add all the node name to your front node before even installing gridengine-exec on the subnode.

I've spent a day struggling with this. The order of events listed here is the first thing that worked. You make modifications at your own peril (and frustration). I tried openjdk with little luck, hence the sun java.

NFS
Finally, I've got nfs set up to share a folder from the front node (~/jobs) to all my subnodes. See here for instructions on how to set it up: http://verahill.blogspot.com.au/2012/02/debian-testing-wheezy-64-sharing-folder.html

When you use ecce, you can and SHOULD use local scratch folders i.e. use your nfs shared folder as the runtime folder, but set scratch to e.g. /tmp which isn't an nfs exported folder.


Before you start, stop and purge
if you've tried installing and configuring gridengine in the past, there may be processes and files which will interfere. On each computer do
ps aux|grep sge
use sudo kill to kill any sge processes
Then
sudo apt-get purge gridengine-*


First install sun/oracle java on all nodes.

[UPDATE 24 Aug 2013: openjdk-6-jre or openjdk-7-jre work fine, so you can skip this]

There's no sun/oracle java in the debian testing repos anymore, so we'll follow this: http://verahill.blogspot.com.au/2012/04/installing-sunoracle-java-in-debian.html

sudo apt-get install java-package
Download the jre-6u31-linux-x64.bin from here: http://java.com/en/download/manual.jsp?locale=en
make-jpkg jre-6u31-linux-x64.bin
sudo dpkg -i oracle-j2re1.6_1.6.0+update31_amd64.deb 

Then select your shiny oracle java by doing:
sudo update-alternatives --config java
sudo update-alternatives --config javaws

Do that one every node, front and subnodes. You don't have to do all the steps though: you just built oracle-j2re1.6_1.6.0+update31_amd64.deb so copy that to your nodes, do sudo dpkg -i oracle-j2re1.6_1.6.0+update31_amd64.deb and then do the sudo update-alternatives dance.



Front node:
sudo apt-get install gridengine-client gridengine-qmon gridengine-exec gridengine-master
(at the moment this installs v 6.2u5-7)

I used the following:
Configure automatically: yes
Cell name: rupert
Master hostname: beryllium
 => SGE_ROOT: /var/lib/gridengine
 => SGE_CELL: rupert
 => Spool directory: /var/spool/gridengine/spooldb
 => Initial manager user: sgeadmin

Once it was installed, I added myself as an sgeadmin:
sudo -u sgeadmin qconf -am ${USER}
sgeadmin@beryllium added "verahill" to manager list
and to the user list:
qconf -au ${USER} users
added "verahill" to access list "users"
We add beryllium as a submit host
qconf -as beryllium
beryllium added to submit host list
Create the group allhosts
qconf -ahgrp @allhosts
1 group_name @allhosts
2 hostlist NONE
I made no changes

Add beryllium to the hostlist
qconf -aattr hostgroup hostlist beryllium @allhosts
verahill@beryllium modified "@allhosts" in host group list
qconf -aq main.q
This opens another text file. I made no changes.
verahill@beryllium added "main.q" to cluster queue list
Add the host group to the queue:
qconf -aattr queue hostlist @allhosts main.q
verahill@beryllium modified "main.q" in cluster queue list
1 core on beryllium is added to SGE:


qconf -aattr queue slots "[beryllium=1]" main.q
verahill@beryllium modified "main.q" in cluster queue list
Add execution host
qconf -ae 
which opens a text file in vim

I edited hostname (boron) but nothing else. Saving returns
added host boron to exec host list
Add boron as a submit host
qconf -as boron
boron added to submit host list
Add 3 cores for boron:
qconf -aattr queue slots "[boron=3]" main.q

Add boron to the queue
qconf -aattr hostgroup hostlist boron @allhosts

Here's my history list in case you can't be bother reading everything in detail above.
 2015  sudo apt-get install gridengine-client gridengine-qmon gridengine-exec gridengine-master
 2016  sudo -u sgeadmin qconf -am ${USER}
 2017  qconf -help
 2018  qconf user_list
 2019  qconf -au ${USER} users
 2020  qconf -as beryllium
 2021  qconf -ahgrp @allhosts
 2022  qconf -aattr hostgroup hostlist beryllium @allhosts
 2023  qconf -aq main.q
 2024  qconf -aattr queue hostlist @allhosts main.q
 2025  qconf -aattr queue slots "[beryllium=1]" main.q
 2026  qconf -as boron
 2027  qconf -ae
 2028  qconf -aattr hostgroup hostlist beryllium @allhosts
 2029  qconf -aattr queue slots "[boron=3]" main.q
 2030  qconf -aattr hostgroup hostlist boron @allhosts

 Next, set up your subnodes:

My example here is a subnode called boron.

On the subnode:
sudo apt-get install gridengine-exec gridengine-client
Configure automatically: yes
Cell name: rupert
Master hostname: beryllium
This node is called boron.

Check whether sge_execd got start after the install
ps aux|grep sge
sgeadmin 25091  0.0  0.0  31712  1968 ?        Sl   13:54   0:00 /usr/lib/gridengine/sge_execd
If not, and only if not, do

/etc/init.d/gridengine-exec start

cat /tmp/execd_messages.*
If there's no message corresponding to the current iteration of sge (i.e. you may have old error messages from earlier attempts) then you're probably in a good place.

Back to the front node:
 qhost 
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
beryllium               lx26-amd64      6  0.57    7.8G    3.9G   14.9G  597.7M
boron                   lx26-amd64      3  0.62    3.8G  255.6M   14.9G     0.0
If the exec node isn't recognised (i.e. it's listed but no cpu info or anything else) then you're in a dark place. Probably you'll find a message about "request for user soandso does not match credentials" in your /tmp/execd_messages.* files on the exec node. The only way I got that solved was stopping all sge processes everywhere, purging all gridengine-* packages on all nodes and starting from the beginning -- hence why I posted the history output above.

qstat -f

queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
main.q@beryllium               BIP   0/0/1          0.64     lx26-amd64  
---------------------------------------------------------------------------------
main.q@boron                   BIP   0/0/3          0.72     lx26-amd64  


Time to see how far we've got:
Create a file called test.qsub on your front node:
#$ -S /bin/csh
#$ -cwd
tree -L 1 -d
hostname
qsub test.qsub 
Your job 2 ("test.qsub") has been submitted
qstat -u ${USER}
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
      2 0.00000 test.qsub  verahill         qw    06/05/2012 14:03:10                                    1        
ls
test.qsub  test.qsub.e2  test.qsub.o2
cat test.qsub.[oe]*
.
0 directories
beryllium
Tree could have had more exciting output I s'pose, but I didn't have any subfolders in my run directory.

So far, so good. We still need to set up parallel environments (e.g. orte, mpi).


Before that, we'll add another node, which is called tantalum and has a quadcore cpu.
On the front node:

qconf -as tantalum
qconf -ae
replace template with tantalum 
qconf -aattr queue slots "[tantalum=4]" main.q
qconf -aattr hostgroup hostlist tantalum @allhosts

 qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
beryllium               lx26-amd64      6  0.67    7.8G    3.7G   14.9G  597.7M
boron                   lx26-amd64      3  0.14    3.8G  248.0M   14.9G     0.0
tantalum                -               -     -       -       -       -       -
On tantalum:
Install java by copying the oracle-j2re1.6_1.6.0+update31_amd64.deb which got created when you set it up the first time.
sudo dpkg -i  oracle-j2re1.6_1.6.0+update31_amd64.deb
sudo update-alternatives --config java
sudo update-alternatives --config javaws

Install gridengine:
sudo apt-get install gridengine-exec gridengine-client

On the front node:

 qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
beryllium               lx26-amd64      6  0.62    7.8G    3.7G   14.9G  601.0M
boron                   lx26-amd64      3  0.15    3.8G  248.6M   14.9G     0.0
tantalum                lx26-amd64      4  4.02    7.7G  977.0M   14.9G   24.1M

 qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
main.q@beryllium               BIP   0/0/1          0.71     lx26-amd64  
---------------------------------------------------------------------------------
main.q@boron                   BIP   0/0/3          0.72     lx26-amd64  
---------------------------------------------------------------------------------
main.q@tantalum                BIP   0/0/4          4.01     lx26-amd64    

It's a beautiful thing when everything suddenly works. 


Parallel environments:
In order to use all the cores on each node we need to set up parallel environments.

qconf -ap orte
pe_name            orte
slots              9999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $fill_up
control_slaves     FALSE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE

To use a parallel environement include #$ -pe orte 3 for 3 slots in your test.qsub:

#$ -S /bin/csh
#$ -cwd
#$ -pe orte 3
tree -L 1 -d
hostname

Submit it:
qsub test.qsub 

Your job 14 ("test.qsub") has been submitted
qstat 
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
     14 0.00000 test.qsub  verahill         qw    06/05/2012 15:43:25                                    3        


verahill@beryllium:~/mine/qsubtest$ cat test.qsub.*
.
0 directories
boron
It got executed on boron.



We are basically done with a basic setup now. To read more, use google. Some additional info that might be helpful is here: http://wiki.gridengine.info/wiki/index.php/StephansBlog

We're going to set up a few more parallel environments:


qconf -ap mpi1

pe_name            mpi1
slots              9
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $fill_up
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE
qconf -ap mpi2



pe_name            mpi2
slots              9
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    2
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE
qconf -ap mpi3


pe_name            mpi3
slots              9
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    3
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE
qconf -ap mpi4


pe_name            mpi4
slots              9
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    4
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE
And we'll call these using the #$ -pe mpi$totalprocs $totalprocs directive below

We need to add them (update: you need to add them to a queue. Which one is irrelevant, as long as the environment and queue parameters are consistent) to our main.q file though:
qconf -mq main.q
pe_list               make orte mpi1 mpi2 mpi3 mpi4

This obviously isn't the end of my travails -- now I need to get nwchem and gaussian happy.
I've got this in my CONFIG.Dynamic (inside joke) file

NWChem: /opt/nwchem/nwchem-6.1/bin/LINUX64/nwchem
Gaussian-03: /opt/gaussian/g09/g09
perlPath: /usr/bin/perl
qmgrPath: /usr/bin/

SGE {
#$ -S /bin/csh
#$ -cwd
#$ -l h_rt=$wallTime
#$ -l h_vmem=4G
#$ -j y
#$ -pe mpi$totalprocs $totalprocs
}

NWChemCommand {
setenv LD_LIBRARY_PATH "/usr/lib/openmpi/lib:/opt/openblas/lib"
setenv PATH "/bin:/usr/bin:/sbin:/usr/sbin"
mpirun -n $totalprocs /opt/nwchem/nwchem-6.1/bin/LINUX64/nwchem $infile > $outfile
}

Gaussian-03Command{
setenv GAUSS_SCRDIR /scratch
setenv GAUSS_EXEDIR /opt/gaussian/g09/bsd:/opt/gaussian/g09/local:/opt/gaussian/g09/extras:/opt/gaussian/g09
/opt/gaussian/g09/g09 $infile $outfile >g09.log
}

And now everything works!
















See below for a few of the annoying errors I encountered during my adventures:


Error -- missing gridengine-client
The gaussian set-up worked fine. The nwchem setup worked on one node but not at all on another -- my problem sounded identical to that described here (two nodes, same binaries, still one works and one doesn't):
http://www.open-mpi.org/community/lists/users/2010/07/13503.php
And it's the same as this one too http://www.digipedia.pl/usenet/thread/11269/867/

[boron:18333] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ../../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line 161--------------------------------------------------------------------------It looks like orte_init failed for some reason; your parallel process islikely to abort.  There are many reasons that a parallel process canfail during orte_init; some of which are due to configuration orenvironment problems.  This failure appears to be an internal failure;here's some additional information (which may only be relevant to anOpen MPI developer):
  orte_plm_base_select failed  --> Returned value Not found (-13) instead of ORTE_SUCCESS--------------------------------------------------------------------------[boron:18333] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ../../../orte/runtime/orte_init.c at line 132--------------------------------------------------------------------------It looks like orte_init failed for some reason; your parallel process islikely to abort.  There are many reasons that a parallel process canfail during orte_init; some of which are due to configuration orenvironment problems.  This failure appears to be an internal failure;here's some additional information (which may only be relevant to anOpen MPI developer):
  orte_ess_set_name failed  --> Returned value Not found (-13) instead of ORTE_SUCCESS--------------------------------------------------------------------------[boron:18333] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ../../../../../orte/tools/orterun/orterun.c at line 543

It's tooka while to troubleshoot this one. As always, when you're troubleshooting you discover the odd thing or two. On my front node:
/usr/bin/rsh -> /etc/alternatives/rsh
which is normal, but
/etc/alternatives/rsh -> /usr/bin/krb5-rsh
There are some krb packages on tantalum, but nothing on boron
boron:
locate rsh|grep "usr/bin"
/usr/bin/rsh

tantalum:
locate rsh|grep "usr/bin"
/usr/bin/glib-genmarshal
/usr/bin/qrsh
/usr/bin/rsh

sudo apt-get autoremove krb5-clients

Of course, that did not get it working...
The annoying things is that nwchem/mpirun on boron work perfectly together, also when submitting jobs directly via ECCE. It's just with qsub that I am having trouble. Search continues:
On the troublesome node:
aptitude search mpi|grep ^i
i   libblacs-mpi-dev                - Basic Linear Algebra Comm. Subprograms - D
i A libblacs-mpi1                   - Basic Linear Algebra Comm. Subprograms - S
i A libexempi3                      - library to parse XMP metadata (Library)
i   libopenmpi-dev                  - high performance message passing library -
i A libopenmpi1.3                   - high performance message passing library -
i   libscalapack-mpi-dev            - Scalable Linear Algebra Package - Dev. fil
i A libscalapack-mpi1               - Scalable Linear Algebra Package - Shared l
i A mpi-default-bin                 - Standard MPI runtime programs (metapackage
i A mpi-default-dev                 - Standard MPI development files (metapackag
i   openmpi-bin                     - high performance message passing library -
i A openmpi-checkpoint              - high performance message passing library -
i A openmpi-common                  - high performance message passing library -

Library conflict?
sudo apt-get autoremove mpi-default-*

And then recompile nwchem.  Still no change.

Finally I found the real problem:
gridengine-client was missing on the troublesome node. Once I had installed that, everything worked!



Errors:
If your parallel job won't start (sits with qw forever), and qstat -o jobid  gives you
scheduling info: cannot run in PE "orte" because it only offers 0 slots
make sure that qstat -f lists all your nodes.

This is good:

 qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
main.q@beryllium               BIP   0/0/1          0.71     lx26-amd64
---------------------------------------------------------------------------------
main.q@boron                   BIP   0/0/3          0.72     lx26-amd64
---------------------------------------------------------------------------------
main.q@tantalum                BIP   0/0/4          4.01     lx26-amd64    
This is bad:
qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
main.q@beryllium               BIP   0/0/1          0.64     lx26-amd64    



To fix it, do 
qconf -aattr hostgroup hostlist tantalum @allhosts

on the front node for all your node names (change tantalum to the correct name)

An unhelpful error message:
qstat -u verahill
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
 3 0.50000 test.qsub  verahill         Eqw   06/05/2012 11:45:18                                    1        
cat test.qsub.[eo]*

/builder/src-buildserver/Platform-7.0/src/linux/lwmsg/src/connection-wire.c:325: Should not be here

This came from a faulty qsub directive: I used
#$ -S csh
instead of
#$ -S /bin/csh
i.e. you should use the latter.

I think it's a potentially common enough mistake that I post it here. See  http://helms-deep.cable.nu/~rwh/blog/?p=159 for more errors.

Links to this post:
http://gridengine.org/pipermail/users/2012-November/005207.html
http://web.archiveorange.com/archive/v/JfPLjOHE5fXSiyFH0yzc

03 June 2012

172. ECCE and a ROCKS cluster: step by step

This is quite similar to a recent post, but here's a step-by-step, detailed account of how to set up ECCE for remote job submission to a ROCKS 5.4.3 cluster (one front node, 4 subnodes)

Coming soon (give it a week): Setting up a virtualbox machine with ecce for (stubborn) windows and ROCKS/CentOS users.

What isn't shown are all the failed attempts and dead-ends I went through and encountered getting to the point where I had a working system. I compiled ECCE. I compiled tcsh. I tried compiling bsd csh, which required me to compile bmake etc. This stuff looks simple, and it is simple -- but not obvious.

NOTE: From the outside we connect to rocks.university.edu. From inside the cluster the submit node is called rocks.local, and the subnodes are called node0, node1, etc. Refer to this naming if you get confused late.

Step 1. Create the site in ecce
From the terminal, do
ecce -admin
and add a new machine

Don't forget to hit Add/Change queue to make the changes to the queue part take effect. Then hit Add/Change. Oh, and pay attention to the Allocation Account tick box - if it's ticked you can't submit anything unless you add an account.  Important: the machine name you add here is the local name or local IP of the submit node -- it's not the 'public' name or url. We'll add that somewhere else later. Don't forget to select the queue manager (I forgot in the screen shot)

Close.

2. Editing your CONFIG file
Since you're already in the terminal, go to ecce-v6.3/apps/siteconfig

Take a quick peek at your Machines file (no editing):
Machines line
rocks rocks.local Dell beo Intel 40:5 ssh :NWChem:Gaussian-03 MN:RD:SD:UN:PW:Q:TL

Take another look at rocks.Q -- there's probably nothing to edit here either:

rocks.Q# Queue details for rocks
Queues:    nwchem
nwchem|minProcessors:       1
nwchem|maxProcessors:       40
nwchem|runLimit:       100000
nwchem|memLimit:       0
nwchem|scratchLimit:       0
Finally, do some editing of your CONFIG.rocks file.

CONFIG.rocks

NWChem: /share/apps/nwchem/nwchem-6.1/bin/LINUX64/nwchem
Gaussian-03: /share/apps/gaussian/g09/g09
perlPath: /usr/bin/
qmgrPath: /opt/gridengine/bin/lx26-amd64
sourcefile: /home/rocksuser/.cshrc
frontendMachine: rocks.university.edu

SGE {
#$ -S /bin/csh
#$ -cwd
#$ -l h_rt=$walltime
#$ -l h_vmem=$memoryG
#$ -j y
#$ -pe orte $totalprocs  
}

NWChemEnvironment{
            LD_LIBRARY_PATH /usr/lib/openmpi/1.3.2-gcc/lib/
}

NWChemCommand {
        /opt/openmpi/bin/mpirun -n $totalprocs $nwchem $infile > $outfile
}
Gaussian-03Command {
    setenv GAUSS_SCRDIR /tmp
    setenv GAUSS_EXEDIR /share/apps/gaussian/g09/bsd:/share/apps/gaussian/g09/local:/share/apps/gaussian/g09/extras:/share/apps/gaussian/g09
        time /share/apps/gaussian/g09/g09 $infile  $outfile }

Obviously, your variables will be different. NOTE that memory is in gigabyte here. You could also do $memoryM for megabyte. Just adjust your launcher requirements accordingly.

Step 3. Making csh modifications on the ROCKS cluster
On the main node just use the root password (or become sudo) and move /etc/csh.cshrc and /etc/csh.login out of the way (backing them up is a good idea). It doesn't seem like you need to make any changes csh-wise to the subnodes.

Step 4. Finalising our set up
Start ecce the normal way (e.g. run ecce from the terminal)
In the Gateway, start the Machine Browser, highlight 'rocks' and click on Setup Remote Access.
Do what you're told.

Step 5. Submit to your heart's content!

NOTE: the option to set the amount of memory is not shown in the launcher window above: my mistake. You can edit you apps/siteconfig/Machines file and add :MM at the end of the line, e.g.
Dynamic beryllium       Unspecified     Unspecified     Unspecified     18:3    ssh     :NWChem:Gaussian-03     MN:RD:SD:UN:PW:Q:TL:MM