02 January 2011

1. Setting up a cheap-ish ubuntu cluster - basic set-up

Note: I have two nodes - one master (beryllium) and one slave (lithium). These names will be used in the examples

Once all the hardware was assembled and linux up and running, I installed a host of programs that I'd be using:

Cluster tools:
sudo apt-get install mpich2 vnstat sinfo ganglia-monitor openmpi-*

For compiling:
sudo apt-get install build-essential gfortran gpp cpp 

Generally useful software:
sudo apt-get install gnuplot mpqc-openmpi gromacs-mpich gomacs-mpich octave octave-parallel qtoctave gnome-activity-journal maxima


I downloaded the nwchem source from here. We'll deal with the compilation and installation in the next post.

I followed this post to get up and running (scroll down) with nfs. Here's my take on it:

On the master node:
sudo apt-get install nfs-kernel-server nfs-common portmap

    We don't want portmap to use 127.0.0.1, so
sudo dpkg-reconfigure portmap

sudo /etc/init.d/portmap restart
sudo mkdir /work
sudo chmod a+wrx /work

sudo nano /etc/exports
  Add the following line to the end of the file:
 /work *(rw,sync)

sudo /etc/init.d/nfs-kernel-server restart
sudo exportfs -a

One the slave node:
sudo apt-get install portmap nfs-common
sudo mkdir /work

sudo nano /etc/fstab
   Add the following line:
beryllium:/work /work nfs   rw   0   0


To get it up and running immediately
sudo mount /work

Next, create a file called mpd.hosts - you can use the /work directory or the home directory. List the hosts/nodes and add, without spaces, the number of processors on each node:

lithium:3
beryllium:3


On beryllium
mpd --ncpus=3 &
mpdtrace -l

which returns something akin to beryllium_12345 (192.168.1.3). 12345 is the port number, which will change each time you start mpd

On lithium
mpd --ncpus=3 -h beryllium -p 12345 &

NOTE: I had problems before specifying --ncpus - gave an error about mpich2 consecutive number/IDs. The down-side is that when you submit a job it fills up the master node, then starts jobs on the slave node(s).