16 December 2011

30. "Bench-marking" nwchem with mpich2 on debian wheezy

For various reasons my beowulf has been dismantled and in boxes for most of the year, with only the six-core node seeing use a normal work computer.

Anyhow, here's a very unscientific test of the performance of my six-core (phenom II, 2.8 GHz, 8Gb RAM) running the nwchem code compiled in the previous post.

The speed-tests were performed by starting up mpd
mpd --ncpus=6&
and then executing with
time mpdrun -n x ./nwchem input.nw
where x is an integer signifying the number of cores

The nwchem.nw files I used was

nwchem.nw
start benzene 

geometry units angstroms
C  0.100  1.396  0.000
C  1.209  0.698  0.000
C  1.209 -0.698  0.000
C  0.000 -1.396  0.000
C -1.209 -0.698  0.000
C -1.209  0.698  0.000
H  0.000  2.479  0.000
H  2.147  1.240  0.000
H  2.147 -1.240  0.000
H  0.000 -2.479  0.000
H -2.147 -1.240  0.000
H -2.147  1.240  0.000
end
basis
 H library sto-3g
 c library sto-3g
end
dft
    xc b3lyp
end
task dft optimize

Here are the results:


(x is number of cores; times in seconds)
x   Run 1   Run 2    Run 3   Run 4   Run 5
1* 40.8     37.9     40.7     40.3      39.9
1   22.2     40.7      40.6    44.8      38.2
2   22.8     22.4     16.3     23.5       21.5
3   14.1     12.3     15.7     15.5       15.1
4   14.5     11.5     12.0     14.9       14.7
5   11.4     11.5     8.9       11.9       12.5
6   16.0     12.2     13.4     9.9        9.6

* No mpd running; executed using time nwchem nwchem.nw



So here's the unscientific part -- the computer is running a full desktop environment with evolution, chrome etc open in the background so that each run sees a slightly different system. I've tried to vary the order in which the runs were made though.

 A guess would be that a longer run would yield more reproducible results. As it is now, the length of the runs vary significantly. The only lesson that can be obtained is that it doesn't help much throwing more cores at a problem as the optimisation times only drop off slowly past a certain point.

Edit: I've run the same file using an almost identical set-up on two more boxes
Don't compare the benchmarks when running at maximum numbers of cpu, since this will be heavily affected by other processes.

Optiplex 990 (Intel i5 2400, 4 cores @ 3.1 GHz, 8 Gb RAM)

x   Run 1   Run 2    Run 3   Run 4   Run 5
1   45.80   46.97  46.56   46.95   39.01
2   22.77  25.81   26.93  26.61   25.81
3   17.18  16.48   18.89  19.26  19.18
4   11.62  16.62   15.82  15.86   16.03

Homebuilt (3 core AMD Athlon 2 X3 @ 3.1 GHz, 4 Gb RAM)

x   Run 1   Run 2    Run 3   Run 4   Run 5   Run 6   Run 7
1   43.74   57.02   40.22   47.89   53.87
2   31.41   22.31   25.83   32.31   33.00
3   36.19   31.01   43.55   24.75   37.82   33.95   27.06


15 December 2011

29. Compiling/Building nwchem with mpich on debian testing 64 bit (Wheezy -- 15/12/2011)

So, as seen in the previous post, mpich2 ver 1.4 and nwchem 6.0 don't play nicely together.

Continuing with the virtual machine in the previous post, I added a line referencing the stable version to /etc/apt/sources.list:
deb ftp://ftp.au.debian.org/debian/ stable main contrib non-free

Important: I ADDED that line -- all lines referencing the testing version (or wheezy) are left untouched.
Do
sudo apt-get update
Since the versions in stable are older adding that line won't affect your system.

Now,
apt-cache showpkg mpich2

Under provides it should say:
1.4.1-1+b1
1.2.1.1-5

Now,
sudo apt-get install mpich2=1.2.1.1-5

You'll get a warning that you're about to downgrade, which is what we're trying to do.

sudo apt-get autoremove (will downgrade dependent packages. Just go with it)
aptitude search mpich2
check what's installed and what version
aptitude show libmpich2-dev
If it's 1.4.1 or not installed, make sure to set it to 1.2.1.1-5 just like for mpich2

Run sh myconfig.sh (see here for the script). Seems to build ok. All the mpd tools are where they should be.
NOTE: according to this post mpd isn't needed in newer (>=1.3) versions.

In summary, this seems to be the way to build nwchem on wheezy -- by downgrading the mpich2 and mpich2-dev packages. Since downgrading those packages may affect other packages as well, it may cause problems, but so far so good.

Testing the built binary:
mpd --ncpus=4 &
mptrace -l
mpdrun -n 4 ./nwchem ../../examples/dirdyvtst/h3/h3tr1.nw

All is good

EDIT (16/12/2011):
So, you've installed an older version of a package. apt-get will want to upgrade it, so you should put the packages on 'hold'. Every time you upgrade your system apt-get will warn you that there are packages on hold, so don't worry about forgetting about it

sudo su
echo "mpich2 hold"|dpkg --set-selections
echo "libmpich2-dev hold"|dpkg --set-selections

(taken from http://forums.debian.net/viewtopic.php?f=20&t=71448&start=15)

As an aside, you may want to downgrade gromacs-mpich to use mpich2-1.2 as well:
sudo apt-get install gromacs-mpich=4.0.7-3

28. Compiling/Building nwchem with mpich on debian 64 bit (15/12/2011) -- observations of squeeze, wheezy, sid and experimental

** NOTE: the nwchem build in the debian repos (6.0-1) does not support mpich. It will not run in parallel. It seems like 6.0-2 will, but it's not in the repos yet, and looking at the package status page I get a bit worried about the likelihood of seeing a finished build for amd64 **

** The solution to building on debian testing and above (and presumably ubuntu >10.10) is here: http://verahill.blogspot.com/2011/12/building-nwchem-on-debian-testing-64.html **

** NOTE: according to this post mpd isn't needed in newer (>=1.3) versions. It is needed -- and provided -- by mpich2 1.2**

So, I've had problems building nwchem on debian testing for about a year now. Actually, building nwchem is pretty straightforward, but building it with mpich2 support doesn't seem to work.

I also noted that mpd doesn't refer to an mpich daemon anymore, leading me to suspect that maybe the version of mpich2 (now at 1.4; was at 1.2 when I built on ubuntu early in the year) could be part of the problem.

Long story short: it builds with mpich2 support on debian stable (squeeze), but not debian testing (wheezy).


----------------------------------
SQUEEZE
----------------------------------
Here's what I did: I installed debian stable 64 bit in virtualbox from the business-card iso (standard tools + ssh. No desktop environment). I made sure the system (squeeze) was up to date, downloaded nwchem-6.0 and extracted it in /home/myhome/nwchem/nwchem-6.0  and create a build file, myconfig.sh, with the following content:
export LARGE_FILES=TRUE
export TCGRSH=/usr/local/bin/ssh
export NWCHEM_TOP=/home/myhome/nwchem/nwchem-6.0
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export USE_MPI=y
export USE_MPIF=y
export MPI_LOC=/usr
export MPI_INCLUDE=$MPI_LOC/include/mpich2
export LIBMPI="-lfmpich -lmpich"
cd $NWCHEM_TOP/src
make nwchem_config
make FC=gfortran

Only thing remaining before building is:
sudo apt-get install mpich2 gfortran

mpich2: version 1.2

sh myconfig.sh then starts the build process which takes about 10-20 minutes in a virtual machine. Everything good. Works as should -- I only have one core in the virtual machine, so can't do much testing. It builds without a hitch though. In addition mpd, mpdtrace etc are all present.

 Time to take a snapshot. 1.7 Gb.
----------------------------------
WHEEZY
----------------------------------
I next edited /etc/apt/sources.list and replaced all instances of squeeze with wheezy (including commenting out the last two lines (wheezey-updates / squeeze-updates), did sudo apt-get update and sudo apt-get dist-upgrade

mpich2: version 1.4.1.
mpd and mpdtrace are now gone. Only mpi* around.

Tried sh myconfig.sh again (added a make clean before make nwchem_config). It builds for a long time - 10-20 minutes -- ultimately the build fails with
..
..
/usr/lib/libmpich.so: undefined reference to `MPL_trid'
..
..
/usr/lib/libmpich.so: undefined reference to `MPL_trinit'
collect2: ld returned 1 exit status
make: *** [all] Error 1

The undefined references are: MPL_trid, MPL_trvalid, MPL_env2int, MPL_trrealloc, MPL_trspace, MPL_trDebugLevel, MPL_TrSetMaxMem, MPL_trlevel, MPL_trmalloc, MPL_putenv, MPL_env2bool, MPL_env2range, MPL_trcalloc, MPL_trfree, MPL_en2str, MPL_trstrdup, MPL_trdump and MPL_trinit

So no luck with Wheezy, which is the system I run on all my computers. I've actually tried quite a few approaches under Wheezy and have managed to complete the odd build, but without getting proper functionality i.e. using mpiexec -n 6 six instances are created instead of one instance spread across six cores so that the system is solved six times instead of just one.


----------------------------------
SID.
----------------------------------
Time to take another snapshot followed by an upgrade.
The wheezy snapshot is 2.2 Gb -- wonder where the 0.5 Gb come from?

Replacing all instances of wheezy with sid using vim (:%s/wheezy/sid/g). Fails to fetch http://security.debian.org/dists/sid/updates/main/binary-amd64/Packages during sudo apt-get update. Oh well. Remove that line from sources.list. Now update works.
Then sudo apt-get dist-upgrade. 140 Mb of archives to get.


As a general thought -- maybe it'd be worth keeping a copy of sid in a virtual machine just to see what's around the corner?

Upgrade done, sudo shutdown -r now
mpich2: version 1.4.1, same as wheezy. No mpd or mpdtrace.

Try building and...starts at 14:17, fails at 14:45. Same errors as for Wheezy.

Last chance -- snapshot (1.9 Gb...), add a ref to experimental and...no updates. They are the same. Not worth trying thus.
------------------------------------


So there you go -- nwchem builds ok on Squeeze using mpich2 ver 1.2, but not on any of the more up-to-date distros. I also wonder about the lack of mpd/mpdtrace -- in Squeeze mpd is the mpich daemon, while in Wheezy and above it's the Music Player Daemon. Something is odd here.

Coming next: can you build nwchem on Wheezy if you pull mpich2 ver 1.2 from the archives? Of course you can. The answer and how-to is here: http://verahill.blogspot.com/2011/12/building-nwchem-on-debian-testing-64.html