Note: I have two nodes - one master (beryllium) and one slave (lithium). These names will be used in the examples
Once all the hardware was assembled and linux up and running, I installed a host of programs that I'd be using:
Cluster tools:
sudo apt-get install mpich2 vnstat sinfo ganglia-monitor openmpi-*
For compiling:
sudo apt-get install build-essential gfortran gpp cpp
Generally useful software:
sudo apt-get install gnuplot mpqc-openmpi gromacs-mpich gomacs-mpich octave octave-parallel qtoctave gnome-activity-journal maxima
I downloaded the nwchem source from here. We'll deal with the compilation and installation in the next post.
I followed this post to get up and running (scroll down) with nfs. Here's my take on it:
On the master node:
sudo apt-get install nfs-kernel-server nfs-common portmap
We don't want portmap to use 127.0.0.1, so
sudo dpkg-reconfigure portmap
sudo /etc/init.d/portmap restart
sudo mkdir /work
sudo chmod a+wrx /work
sudo nano /etc/exports
Add the following line to the end of the file:
/work *(rw,sync)
sudo /etc/init.d/nfs-kernel-server restart
sudo exportfs -a
One the slave node:
sudo apt-get install portmap nfs-common
sudo mkdir /work
sudo nano /etc/fstab
Add the following line:
beryllium:/work /work nfs rw 0 0
To get it up and running immediately
sudo mount /work
Next, create a file called mpd.hosts - you can use the /work directory or the home directory. List the hosts/nodes and add, without spaces, the number of processors on each node:
lithium:3
beryllium:3
On beryllium
mpd --ncpus=3 &
mpdtrace -l
which returns something akin to beryllium_12345 (192.168.1.3). 12345 is the port number, which will change each time you start mpd
On lithium
mpd --ncpus=3 -h beryllium -p 12345 &
NOTE: I had problems before specifying --ncpus - gave an error about mpich2 consecutive number/IDs. The down-side is that when you submit a job it fills up the master node, then starts jobs on the slave node(s).