Showing posts with label cloud. Show all posts
Showing posts with label cloud. Show all posts

18 January 2016

626. Briefly: Gaussian and cloud computing -- Gaussian G09D with Slurm on aws/ec2

Note: you may want to install awscli and euca2ools. I didn't, so I don't actually know whether they are useful.

My instructions are quite rudimentary since I don't have much time to write these blog posts anymore. Hopefully there's enough information to get you through.


AWS
Either way, sign up for AWS. If you already have an amazon ID I think you can use that. Go to https://aws.amazon.com/

Select Launch an Instance and pick the ubuntu AIM and do Launch and Review. I launched it as a t2.micro instance type, as it is free and it's sufficient for set up but not to run jobs.

Hit launch, and create a new key pair. I called mine myfirstkeypair and saved the pem file in my ~/Downloads folder

In my Downloads folder:
ssh -i "myfirstkeypair.pem" ubuntu@ec2-11-222-33-444.us-west-2.compute.amazonaws.com
I then set a password in the ubuntu AWS image:
sudo passwd ubuntu

I added my id_rsa.pub to ~/.ssh/authorized_keys on the ubuntu AWS image to make logging in via ssh easier -- that way I won't need the pem file.


Set up Gaussian
I then connected with SCP and uploaded my gaussian files -- I went straight for EM64T G09D. It went quite fast at +5 MB/s

scp E6L-103X.tgz ubuntu@ec2-00-111-22-333.us-west-2.compute.amazonaws.com:/home/ubuntu/E6L-103X.tgz

Once that was done, on the ubuntu AWS instance I did:
sudo apt-get install csh 
sudo mkdir /opt/gaussian
cd /opt 
sudo chown ubuntu gaussian -R
cd /opt/gaussian
cp ~/E6L-103X.tgz .
tar xvf E6L-103X.tgz
cd g09
csh bsd/install

echo 'export GAUSS_EXEDIR=/opt/gaussian/g09/bsd:/opt/gaussian/g09/local:/opt/gaussian/g09/extras:/opt/gaussian/g09' >> ~/.bashrc
echo 'export GAUSS_SCRDIR=/home/ubuntu/scratch' >> ~/.bashrc
echo 'export PATH=$PATH:/opt/gaussian/g09' >> ~/.bashrc
source ~/.bashrc 
mkdir ~/scratch ~/jobs

NOTE that you can't run any gaussian jobs under a t2.micro instance. You will have to stop and relaunch as at least a t2.small instance or the jobs will be 'Killed' (that's what is echoed in the terminal when you try to run)
Note that if you terminate an image it will be deleted.

Stop the image and then create a snapshot or an image from it to keep everything you've installed.

Set up Slurm
You'll want a queue manager so that you can launch several jobs in serial. Also, you can set up your script so that it shuts down the image when your job is done to save money.

sudo apt-get update
sudo apt-get install slurm-llnl

ControlMachine=localhost ControlAddr=127.0.0.1 MpiDefault=none ProctrackType=proctrack/pgid ReturnToService=2 SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd SlurmUser=slurm StateSaveLocation=/var/lib/slurm-llnl/slurmctld SwitchType=switch/none TaskPlugin=task/none FastSchedule=1 SchedulerType=sched/backfill SelectType=select/linear AccountingStorageType=accounting_storage/none ClusterName=rupert JobAcctGatherType=jobacct_gather/none SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log SlurmdLogFile=/var/log/slurm-llnl/slurmd.log NodeName=localhost NodeAddr=127.0.0.1 PartitionName=All Nodes=localhost
sudo /usr/sbin/create-munge-key
Edit /etc/default/munge:
OPTIONS=--force
Then run
sudo service slurm-llnl restart
sudo service munge restart 
Test using slurm.batch
#!/bin/bash # #SBATCH -p All #SBATCH --job-name=test #SBATCH --output=res.txt # #SBATCH --ntasks=1 #SBATCH --time=10:00 srun hostname srun sleep 60
and submit with
sbatch slurm.batch
 squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 2       All     test   ubuntu  R       0:08      1 localhost

Benchmark:
#!/bin/csh #SBATCH -p All #SBATCH --time=9999999 #SBATCH --output=slurm.out #SBATCH --job-name=benchmark setenv GAUSS_SCRDIR /home/ubuntu/scratch setenv GAUSS_EXEDIR /opt/gaussian/g09/bsd:/opt/gaussian/g09/local:/opt/gaussian/g09/extras:/opt/gaussian/g09 /opt/gaussian/g09/g09< benchmark.in > benchmark.out
Using the same opt/freq benchmark as in post 621.

c4.2xlarge 2h 11 min [1h 20 min] 8 vcpu/16 Gb
c4.4xlarge 1h 15 min [     44 min] 16 vcpu/32 Gb
c4.8xlarge      41 min [     25 min] 36 vcpu/60 Gb

It scales surprisingly well, although not perfectly linearly. It's clear that it's cheaper to use a smaller instance, so if time isn't critical or the larger memory isn't needed, c4.8xlarge is not the first choice.

Dropbox:
You might want to use dropbox to transfer files back and forth, especially finished job files (useful if you shut down the machine using a slurm script as shown below)

cd ~ && wget -O - "https://www.dropbox.com/download?plat=lnx.x86_64" | tar xzf -
~/.dropbox-dist/dropboxd
This computer isn't linked to any Dropbox account... Please visit https://www.dropbox.com/cli_link_nonce?nonce=0011223344556677889900aabbccddeef to link this device. This computer isn't linked to any Dropbox account...

Open that link in a browser, then go back to the terminal.
 
wget -O - https://www.dropbox.com/download?dl=packages/dropbox.py > dropbox.py
sudo mv dropbox.py /usr/local/bin
sudo chmod +x d/usr/local/bin/dropbox.py
dropbox.py autostart y

Now, since you don't want to use up space unnecessarily (you're paying for it after all), exclude as many directories as possible. To exclude all existing dropbox dirs, do
 
cd ~/Dropbox
dropbox.py exclude add `ld -d */`
dropbox.py exclude add `ld *.*`
dropbox.py exclude list

Note that it can't handle directories with spaces in the name, so you'll need to polish the list by hand. Next create a directory where you want to run and store your jobs,e .g.
mkdir ~/Dropbox/aws_jobs

When you run a gaussian job, make sure to specify where the .chk files should end up, e.g.
%chk=/home/ubuntu/scratch/benchmark.chk
so that you don't use up space/bandwidth for your chk files (unless of course you want to).

Stop after execution:
Use a batch script along these lines:
#!/bin/csh #SBATCH -p All #SBATCH --time=9999999 #SBATCH --output=slurm.out #SBATCH --job-name=benchmark setenv GAUSS_SCRDIR /home/ubuntu/scratch setenv GAUSS_EXEDIR /opt/gaussian/g09/bsd:/opt/gaussian/g09/local:/opt/gaussian/g09/extras:/opt/gaussian/g09 /opt/gaussian/g09/g09< benchmark.in > benchmark.out rm /home/ubuntu/scratch/*.* sudo shutdown -h now

22 June 2012

199. NeCTAR -Virtualisation of Australian compute resources -- first steps

So they are seeing whether they can make more efficient use of the compute resources at different institutions in Australia by creating a cloud to pool their resources. One of the potential solutions is called NeCTAR.


Getting started
Go to http://dashboard.rc.nectar.org.au/auth/login/?next=/dash/
Log in using your institutions username and password

You now have two options to deal with the key issue:

Method 1 -- generate online
Once you're in, create a keypair under Manage Compute/Access & Security and give it an easy-to-remember name

This is your private key, so protect it: don't lose it and don't expose it. You can't download it again. You delete it, it's gone.

On your computer
mv ~/Downloads/nectar.pem ~/.ssh
chmod og-rwx ~/.ssh/nectar.pem
cp nectar.pem nectar
ssh-keygen -e -f nectar >nectar.pub

 ls nectar* -lah
-rw------- 1 me me 887 Jun 22 11:31 nectar
-rw------- 1 me me 887 Jun 22 11:28 nectar.pem
-rw-r--r-- 1 me me 335 Jun 22 11:31 nectar.pub
To use the key do
ssh -i nectar user@server

Method 2 -- BYOK
You're using linux -- you probably have your own key already.
Go to the Manage Compute/Access & Security, Import Keypair

Paste your ~/.ssh/id_rsa.pub (or id_dsa.pub) key.


And that's the extent of the setup.

Test run
Go to Manage Compute/Images & Snapshot and select a Real Linux image (i.e. Debian)
Select image
Hit Launch.
Set up -- don't forget to check SSH to be able to log on. If you want to be able to ping, check icmp as well.
Set up the image -- the defaults are ok, but make sure to check icmp (to be able to ping) and ssh (to be able to log in).
Loading
Generating and loading the image takes about 10-20 seconds -- about the duration of a Victorian earthquake.
Running
Now your image is up an running. To check that all is well

ping -c 3 115.146.92.154
PING 115.146.92.154 (115.146.92.154) 56(84) bytes of data.
64 bytes from 115.146.92.154: icmp_req=1 ttl=54 time=1.70 ms
64 bytes from 115.146.92.154: icmp_req=2 ttl=54 time=1.66 ms
64 bytes from 115.146.92.154: icmp_req=3 ttl=54 time=1.73 ms

--- 115.146.92.154 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 1.660/1.698/1.733/0.029 ms

To be able to log in via ssh you need to know what username to use -- it's (probably) image specific.

To find out, click on the image name (here: testing4)
Click me
Hit the 'Log' tab
Select 'log'
 And look for the username which is created in addition to root
Look for the username -- here it's debian


ssh -v -i ~/.ssh/tmp/nectar debian@115.146.92.154
The authenticity of host '115.146.92.154 (115.146.92.154)' can't be established.
RSA key fingerprint is 81:a8:a7:0f:a9:68:a0:08:f1:60:45:e3:57:2e:4c:4c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '115.146.92.154' (RSA) to the list of known hosts.


Once you're in you'll be greeted by:

Linux unnamed-virtual-machine 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
debian@i-00001637:~$ 
debian@i-00001637:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/vda1             9.9G  768M  8.6G   9% /
tmpfs                 2.0G     0  2.0G   0% /lib/init/rw
udev                  2.0G  112K  2.0G   1% /dev
tmpfs                 2.0G     0  2.0G   0% /dev/shm
debian@i-00001637:~$ uname -a
Linux i-00001637 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 x86_64 GNU/Linux
debian@i-00001637:~$ groups
debian cdrom floppy audio dip video plugdev
debian@i-00001637:~$ cat /etc/group|grep debian
cdrom:x:24:debian
floppy:x:25:debian
audio:x:29:debian
dip:x:30:debian
video:x:44:debian
plugdev:x:46:debian
debian:x:1000:

When you're done, don't forget to log out and terminate your image. If you leave it running it will count towards your resource allocations.
Terminating


Notes:

1. you'll run into trouble with the key fingerprints eventually as the IP addresses and key fingerprints won't be matching. Either you'll be doing a lot of editing of you ~/.ssh/known_hosts file or you have to relax your security setttings.

2. Yes, you can log in as root as well. The default user does not have sudo powers. 

3. It takes about 60 seconds after the launch of the image before the openssh server is up and accepting connections. Think more desktop speeds than laptop+SSD speeds.

4. For actual production stuff you can crank up the image requirements:
16 cores and 65 GB? Why, thank you!
5. Also, I think the real value of using virtual machine is that you can load a vanilla setup and customize it, then saving it by making a snapshot:
Snapshot saves
A first couple of actions might be to add a new user, and edit /etc/sudoers.

Troubleshooting ssh:
If you're having problems logging in using your key, use the ssh -v switch as shown above and parse the output



Unsuccessful:
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/me/.ssh/id_rsa
debug1: Authentications that can continue: publickey,password
debug1: Trying private key: /home/me/.ssh/id_dsa
debug1: Trying private key: /home/me/.ssh/id_ecdsa
debug1: Next authentication method: password

A successful authentication should contain
debug1: Offering RSA public key: /home/me/.ssh/id_rsa
debug1: Server accepts key: pkalg ssh-rsa blen 279
debug1: read PEM private key done: type RSA
debug1: Authentication succeeded (publickey).


If you are sure that you're using the right key (e.g. using -i), then make sure that you're using the right username -- to find out how to find it, look above.