Note that you should under no circumstances do this unless you've been specifically allowed to do so by your cluster manager.
If you get clearance, you submit the script and it will run in the background and resubmit scripts until the job is done.
To get the daisychain script, do
mkdir ~/tmp cd ~/tmp git clone https://github.com/johnfonner/daisychain.git
This will pull the latest version of daisychain.slurm. Rename it to e.g. edited.slurm
General editing of the slurm script:
1.
Replace all instances of
~/.daisychain
with
~/daisychain_$baseSlurmJobName
to avoid conflicts when several jobs are running concurrently
2.
To run the script on your own system which you've set up like shown in this post, change
loginNode="login1"
to
loginNode="localhost"
If you're using stampede.TACC, stick to login1.
3. For gaussian jobs on stampede.TACC
A.
put
module load gaussian
before
if [ "$thisJobNumber" -eq "1" ]; then
B
Set up your restart job scripts. For example, if the job section of your slurm script looks like this
with freq.g09in looking likemkdir $SCRATCH/gaussian_tmp export GAUSS_SCRDIR=$SCRATCH/gaussian_tmp if [ "$thisJobNumber" -eq "1" ]; then #first job echo "Starting First Job:" g09 < freq.g09in > output_$thisJobNumber else #continuation echo "Starting Continuation Job:" g09 < freq_restart.g09in > output_$thisJobNumber fi
with freq.g09in being something along the lines of%nprocshared=16 %rwf=/scratch/0XXXX/XXXX/gaussian_tmp/ajob.rwf %Mem=2000000000 %Chk=/home1/0XXX/XXXX/myjob/ajob.chk #P rpbe1pbe/GEN 5D Freq() SCF=(MaxCycle=256 ) Punch=(MO) Pop=()
(note that the above example is a bit special since it 1) saves the .rwf (which is huge) and 2) is restarting a frequency job. For a simple geoopt job it's enough to restart from the .chk file.%nprocshared=16 %Mem=2000000000 %rwf=/scratch/0XXX/XXXX/gaussian_tmp/ajob.rwf %Chk=/home1/0XXXX/XXXX/myjob/ajob.chk #P restart
Testing at home
I set up a home system with slurm as shown here: http://verahill.blogspot.com.au/2014/03/565-setting-up-slurm-on-debian-wheezy.html
First edit the daisychain.slurm script as shown above. Note that your slurm script must end with .slurm for the script to recognise it as a slurm script. You can get around this by editing your script and specifying a job script name.
Specifically, change the run time to
comment out the partition name#SBATCH -t 00:00:10 # Run time (hh:mm:ss)
and change the job section to##SBATCH -p normal
#-------------------Job Goes Here-------------------------- if [ "$thisJobNumber" -eq "1" ]; then echo "Starting First Job:" sh sleeptest.sh else echo "Starting Continuation Job:" sh sleeptest_2.sh fi #----------------------------------------------------------
Next set up key-based log in for localhost (if you haven't got a keypair, use ssh-keygen:
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys ssh localhost exit
Create two job files. sleeptest.sh:
andecho "first job" date sleep 65 date
echo "second job" date sleep 9 echo "Do nothing"
Submit using
sbatch test.slurm
Make sure to change
#SBATCH -J testx # Job namefor each job so that you can have several running concurrently.