Lindqvist -- a blog about Linux and Science. Mostly.: 618. Modifying ECCE to work with slurm

~~UPDATE: ecce stops monitoring the job after 10-20 seconds. The job continues fine though. Working on fixing the monitoring issue. This message will be removed once that's fixed.~~ It was due to $q needing to be lowercase (i.e. 'slurm', not 'Slurm') in eccejobmonitor.

Sun Gridengine has been removed from debian jessie (it's in wheezy and sid). This has given me a good excuse to explore setting up SLURM on my debian cluster. So I did: http://verahill.blogspot.com.au/2015/07/617-slurm-on-debian-jessie-and.html

My setup is very simple, with each node having it's own working directory that they export via NFS to the main node. Also, I never run jobs across several nodes. Because of that, each node has it's own queue. Not how beowulf clusters were supposed to work, but it's the best solution for me (e.g. ROCKS does the opposite -- exports the user dir from the main node, but that makes reading and writing slow where it counts i.e. on the work nodes).

I've currently got this slurm.conf:


ControlMachine=beryllium
ControlAddr=192.168.1.1
MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=2
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdSpoolDir=/var/lib/slurm/slurmd
SlurmUser=slurm
StateSaveLocation=/var/lib/slurm/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/linear
AccountingStorageType=accounting_storage/filetxt
AccountingStorageLoc=/var/log/slurm/accounting
ClusterName=rupert
JobAcctGatherType=jobacct_gather/none
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
NodeName=beryllium NodeAddr=192.168.1.1
NodeName=neon NodeAddr=192.168.1.120 state=unknown
NodeName=tantalum NodeAddr=192.168.1.150 state=unknown
NodeName=magnesium NodeAddr=192.168.1.200 state=unknown
NodeName=carbon NodeAddr=192.168.1.190 state=unknown
NodeName=oxygen NodeAddr=192.168.1.180 state=unknown
PartitionName=All Nodes=neon,beryllium,tantalum,oxygen,magnesium,carbon default=yes maxtime=infinite state=up
PartitionName=mpi4 Nodes=tantalum maxtime=infinite state=up
PartitionName=mpi12 Nodes=carbon maxtime=infinite state=up
PartitionName=mpi8 Nodes=neon maxtime=infinite state=up
PartitionName=mpix8 Nodes=oxygen maxtime=infinite state=up
PartitionName=mpix12 Nodes=magnesium maxtime=infinite state=up
PartitionName=mpi1 Nodes=beryllium maxtime=infinite state=up

and sinfo returns


PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
All*         up   infinite      6   idle beryllium,carbon,magnesium,neon,oxygen,tantalum
mpi4         up   infinite      1   idle tantalum
mpi12        up   infinite      1   idle carbon
mpi8         up   infinite      1   idle neon
mpix8        up   infinite      1   idle oxygen
mpix12       up   infinite      1   idle magnesium
mpi1         up   infinite      1   idle beryllium

The first step was to figure out what files to edit:

grep -rs "qsub"

apps/siteconfig/QueueManagers:SGE|submitCommand:           qsub ##script##

grep -rs "SGE"

apps/scripts/eccejobmonitor:                     &MsgSendUp("SGE job id '$id' in state '$state'");
[..]
apps/siteconfig/Queues:magnesium|queueMgrName:    SGE

Here are the files I edited:

apps/siteconfig/QueueManagers:


12 QueueManagers:      LoadLeveler \
 13             Maui \
 14             EASY \
 15             PBS \
 16             LSF \
 17             Moab \
 18             SGE \
 19             Shell\
 20             Slurm


185 Shell|jobIdParseExpression:   \ [0-9]+
186 
187 ###############################################################################
188 # SLURM
189 # Simple Linux Utility for Resource Management
190 #
191 #
192 Slurm|submitCommand:           sbatch ##script##
193 Slurm|cancelCommand:           scancel ##id##
194 Slurm|queryJobCommand:         squeue
195 Slurm|queryMachineCommand:     sinfo -p ##queue##
196 Slurm|queryQueueCommand:       squeue -a
197 Slurm|queryDiskUsageCommand:   df -k
198 Slurm|jobIdParseExpression:    .*
199 Slurm|jobIdParseLeadingText:   job

apps/scripts/eccejobmonitor:


2124         LogMsg "Globus status from eccejobstore: $state";
2125     }
2126     elsif ($q eq 'slurm') 
2127     {
2128         $cmd = "squeue 2>&1";
2129         if (open(QUERY, "$cmd |"))
2130         {
2131             $gotState = 0;
2132             while ()
2133             {   
2134                 LogMsg "JobCheck: Slurm qstat line: $_";
2135                 if (/^\s*$id/)
2136                 {
2137                     my $state = (split)[5];
2138 
2139                     &MsgSendUp("Slurm job id '$id' in state '$state'");
2140 
2141                     if (grep {$state eq $_} qw{R
2142                                                t})
2143                     {
2144                         $status = $JOB_STATE_RUNNING;
2145                     }
2146                     elsif (grep {$state eq $_} qw{PD})
2147                     {
2148                         $status = $JOB_STATE_PENDING;
2149                     }
2150                     $gotState = 1;
2151                     last;
2152                 }
2153             }
2154             if ($gotState == 0)
2155             {   
2156                 if ($gJobCheckState != $JOB_STATE_NONE)
2157                 {
2158                     $status = $JOB_STATE_DONE;
2159                 }
2160             }
2161             close QUERY;

Next set up a new machine (or queue) using ecce -admin. Set up a queue -- you won't be able to select Slurm, so select e.g. PBS. Edit the apps/siteconfig/CONFIG.machinename file to e.g.


1 NWChem: /opt/nwchem/Nwchem/bin/LINUX64/nwchem
  2 Gaussian-03: /opt/gaussian/g09d/g09/g09
  3 perlPath: /usr/bin/
  4 qmgrPath: /usr/bin/
  5 xappsPath: /usr/bin/
  6 
  7 Slurm {
  8 #!/bin/csh
  9 #SBATCH -p mpi8
 10 #SBATCH --time=$walltime
 11 #SBATCH --output=slurm.out
 12 #SBATCH --job-name=$submitFile
 13 }
 14 
 15 NWChemEnvironment {
 16               PYTHONPATH /opt/nwchem/Nwchem/contrib/python
 17 }
 18 
 19 NWChemFilesToRemove{ core *.aoints.* *.gridpts.* }
 20 
 21 NWChemCommand {
 22 setenv PATH "/bin:/usr/bin:/sbin:/usr/sbin"
 23 setenv LD_LIBRARY_PATH "/usr/lib/openmpi/lib:/opt/openblas/lib:/opt/acml/acml5.3.1/gfortran64_fma4_int64/lib:/opt/acml/acml5.3.1/gfortran64_int64/lib:/opt/intel/mkl/lib/intel64"
 24 hostname
 25 mpirun -n $totalprocs /opt/nwchem/Nwchem/bin/LINUX64/nwchem $infile > $outfile
 26 }
 27 
 28 Gaussian-03FilesToRemove{ core *.rwf }
 29 
 30 Gaussian-03Command{
 31 set path = ( /opt/nbo6/bin $path )
 32 setenv GAUSS_SCRDIR /home/me/scratch
 33 setenv GAUSS_EXEDIR /opt/gaussian/g09d/g09/bsd:/opt/gaussian/g09d/g09/local:/opt/gaussian/g09d/g09/extras:/opt/gaussian/g09d/g09
 34 /opt/gaussian/g09d/g09/g09< $infile > $outfile 
 35 echo 0
 36 }
 37 
 38 Wrapup{
 39     dmesg|tail
 40    find ~/scratch/* -name "*" -user me|xargs -I {} rm {} -rf
 41 }

Next, edit apps/siteconfig/Queues -- in my case the machine I created is called neon-slurm:


neon-slurm|queueMgrName:    Slurm
neon-slurm|queueMgrVersion: 2.0~
neon-slurm|prefFile:        neon-slurm.Q

And that's all. You should now be able to submit jobs via slurm. There's obviously a lot more than can be done and configured with SLURM, but this was enough to get me up and running, so that I'm now 'future-proofed' in case SGE never comes back into debian stable.

And here's what it looks like when my ecce-submitted jobs are running:

squeue

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                30      mpi8 In_monom     me   PD       0:00      1 (Resources)
                29      mpi8 b_monome     me    R      16:12      1 neon
                31    mpix12 tl_dimer     me    R      34:28      1 magnesium

Pages

28 July 2015

618. Modifying ECCE to work with slurm

No comments:

Post a Comment