09 January 2013

312. Tau + OpenMPI profiling on Debian Testing/Wheezy

Still searching for a way to easily look at the execution of parallel jobs I came across TAU: http://www.cs.uoregon.edu/Research/tau/home.php

You can download without registering, but please do register as the number of registered users tend to be important for funding and evaluation of software development in academia: http://www.cs.uoregon.edu/Research/tau/downloads.php

I'm not really sure about how to use PDT, and I've used Tau without it before without any problems.

The compilation order below is also important -- pdt won't build without libpdb.a which is generated by tau -- but you can't configure tau with -pdt if it doesn't exist.


Compiling
sudo mkdir /opt/tau
sudo chown $USER /opt/tau
cd /opt/tau

wget http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/software_werkzeuge_zur_unterstuetzung_von_programmierung_und_optimierung/otf/dateien/OTF-1.12.2salmon.tar.gz
tar xvf OTF-1.12.2salmon.tar.gz
cd /OTF-1.12.2salmon/
./configure --prefix=/opt/tau/OTF
make
make install
cd ../

wget http://tau.uoregon.edu/tau.tgz
tar xvf tau.tgz
cd tau-2.22-p1/
./configure -mpilib=/usr/lib/openmpi/lib -prefix=/opt/tau -openmp -TRACE -iowrapper -otf=/opt/tau/OTF -pthread
make install
cd ../

wget http://tau.uoregon.edu/pdt.tar.gz
tar xvf pdt.tar.gz
cd pdtoolkit-3.18.1/
./configure -prefix=/opt/tau/pdt
make
make install


cd ../tau-2.22-p1/
./configure -mpilib=/usr/lib/openmpi/lib -prefix=/opt/tau -openmp -TRACE -iowrapper -pthread -otf=/opt/tau/OTF -pdt=/opt/tau/pdt

make install


Testing
Time to try it out on something parallel.

First set the path

PATH=$PATH:/opt/tau/x86_64/bin

I used nwchem with this input file, co2.nw:
title "co nmr" geometry c 0 0 0 o 0 0 1.13 end basis * library "6-311+G*" end property shielding end dft direct grid fine mult 1 xc HFexch 0.05 slater 0.95 becke88 nonlocal 0.72 vwn_5 1 perdew91 0.81 end task dft property

and ran it using
mpirun -n 3 tau_exec nwchem co2.nw

which ends with
Total times cpu: 4.8s wall: 7.6s
It's obviously a bit too short, but will do for illustration purposes.

That generates a set of files, profile.*.0.0 -- one for each thread i.e. profile.1.0.0, profile.2.0.0 and profile.3.0.0 in this particular case. There are a lot of options for tracing, using hardware counters etc. -- see http://www.cs.uoregon.edu/Research/tau/docs/newguide/
pprof -s
Reading Profile files in profile.* FUNCTION SUMMARY (total): --------------------------------------------------------------------------------------- %Time Exclusive Inclusive #Call #Subrs Inclusive Name msec total msec usec/call --------------------------------------------------------------------------------------- 100.0 15,813 25,931 3 14276 8643959 .TAU application 18.8 4,870 4,870 10272 0 474 MPI_Barrier() 12.1 3,138 3,138 3 0 1046279 MPI_Init() 8.1 2,090 2,090 818 0 2556 MPI_Recv() 0.0 9 9 3 0 3173 MPI_Finalize() 0.0 3 3 24 0 128 MPI_Bcast() 0.0 2 2 6 0 463 MPI_Comm_dup() 0.0 1 1 790 0 2 MPI_Comm_size() 0.0 0.872 0.872 818 0 1 MPI_Send() 0.0 0.294 0.294 841 0 0 MPI_Comm_rank() 0.0 0.17 0.17 674 0 0 MPI_Get_count() 0.0 0.111 0.111 3 0 37 MPI_Comm_free() 0.0 0.026 0.026 3 0 9 MPI_Errhandler_set() 0.0 0.024 0.024 6 0 4 MPI_Group_rank() 0.0 0.02 0.02 6 0 3 MPI_Comm_compare() 0.0 0.015 0.015 4 0 4 MPI_Comm_group() 0.0 0.008 0.008 4 0 2 MPI_Group_size() 0.0 0.004 0.004 1 0 4 MPI_Group_translate_ranks() FUNCTION SUMMARY (mean): --------------------------------------------------------------------------------------- %Time Exclusive Inclusive #Call #Subrs Inclusive Name msec total msec usec/call --------------------------------------------------------------------------------------- 100.0 5,271 8,643 1 4758.67 8643959 .TAU application 18.8 1,623 1,623 3424 0 474 MPI_Barrier() 12.1 1,046 1,046 1 0 1046279 MPI_Init() 8.1 696 696 272.667 0 2556 MPI_Recv() 0.0 3 3 1 0 3173 MPI_Finalize() 0.0 1 1 8 0 128 MPI_Bcast() 0.0 0.926 0.926 2 0 463 MPI_Comm_dup() 0.0 0.436 0.436 263.333 0 2 MPI_Comm_size() 0.0 0.291 0.291 272.667 0 1 MPI_Send() 0.0 0.098 0.098 280.333 0 0 MPI_Comm_rank() 0.0 0.0567 0.0567 224.667 0 0 MPI_Get_count() 0.0 0.037 0.037 1 0 37 MPI_Comm_free() 0.0 0.00867 0.00867 1 0 9 MPI_Errhandler_set() 0.0 0.008 0.008 2 0 4 MPI_Group_rank() 0.0 0.00667 0.00667 2 0 3 MPI_Comm_compare() 0.0 0.005 0.005 1.33333 0 4 MPI_Comm_group() 0.0 0.00267 0.00267 1.33333 0 2 MPI_Group_size() 0.0 0.00133 0.00133 0.333333 0 4 MPI_Group_translate_ranks()

...which I can't pretend to understand. Reasonably, the first line would be the cpu time and the wall time (4.8 and 7.6 s vs 5,271 and 8,643 ms).

A visual representation can be had by launching paraprof:
paraprof


Now it's time to explore...

The one thing that doesn't seem to work is visualisation of the communication matrix...



Failed attempt to build with vampirtrace
sudo mkdir /opt/tau
sudo chown $USER /opt/tau
cd /opt/tau


wget http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/software_werkzeuge_zur_unterstuetzung_von_programmierung_und_optimierung/otf/dateien/OTF-1.12.2salmon.tar.gz
tar xvf OTF-1.12.2salmon.tar.gz
cd /OTF-1.12.2salmon/
./configure --prefix=/opt/tau/OTF
make
make install
cd ../


wget http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/software_werkzeuge_zur_unterstuetzung_von_programmierung_und_optimierung/vampirtrace/dateien/VampirTrace-5.14.1.tar.gz
tar xvf VampirTrace-5.14.1.tar.gz
cd VampirTrace-5.14.1/
./configure --prefix=/opt/tau/vampirtrace --with-mpi-dir=/usr/lib/openmpi/lib --with-extern-otf-dir=/opt/tau/OTF
make
make install


wget http://tau.uoregon.edu/tau.tgz
tar xvf tau.tgz
cd tau-2.22-p1/
./configure -mpilib=/usr/lib/openmpi/lib -prefix=/opt/tau -openmp -TRACE -iowrapper -otf=/opt/tau/OTF -vampirtrace=/opt/tau/vampirtrace
make install

It builds fine, but during execution of mpirun -n 2 tau_exec... I get
Error: No matching binding for 'mpi' in directory /opt/tau/x86_64/lib
Available bindings (/opt/tau/x86_64/lib):
Error: No matching binding for 'mpi' in directory /opt/tau/x86_64/lib
Available bindings (/opt/tau/x86_64/lib):
  /opt/tau/x86_64/lib/shared-disable
  /opt/tau/x86_64/lib/shared-disable

311. Compiling MPE for MPI profiling

I've been wanting to get an overview of how my parallel (nwchem, gamess, dalton, etc.) programs are running, and whether there are any obvious bottlenecks other than the network and slow harddrives that I can sort out.

The Australia high performance computer facility in Canberrra uses http://ipm-hpc.sourceforge.net/, but I'm not having much luck compiling it, and the lack of recent updates makes me somewhat less willing to invest too much effort into it.

So I stumbled across MPE instead: http://www.mcs.anl.gov/research/projects/perfvis/download/index.htm#MPE

The problem is that almost all of the links of that page are broken, including those pointing towards the documentation, so I don't actually know how to use it properly. The presence of mpecc in /opt/mpe/bin suggests that it's used as a stand-in for mpicc when compiling, which I'll test some day.


Installing/compiling
cd ~/tmp
wget ftp://ftp.mcs.anl.gov/pub/mpi/mpe/mpe2.tar.gz
tar xvf mpe2.tar.gz
cd mpe2-1.3.0/
./configure MPI_CC=mpicc MPI_F77=mpif77 --prefix=/opt/mpe
make
sudo make install

I found what I looked for in Tau instead: 
http://verahill.blogspot.com.au/2013/01/312-tau-mpi-profiling-on-debian.html

310. Remote mounting using sshfs

I've run out of USB ports on my work desktop, so I occasionally cheat and attach USB drives to one of my compute nodes and transfer the files across the network to my desktop. Since I've got a gigabit switch set up, the speeds are quite acceptable.

NFS isn't really a solution here. Instead, sshfs is the tool to use.

The local and remote computer will be referring to Desktop and Node, respectively. The specific example I'm using here is that of a USB drive manually mounted on the Node, which contains pictures that I want to transfer to my Desktop.

On the Node
The plugged in USB device is found as /dev/sdb, and holds only one partition, /dev/sdb1.

sudo mkdir /media/usbdrive
sudo mount /dev/sdb1 /media/usbdrive


On the Desktop

sudo apt-get install sshfs
sudo mkdir /media/remote
sudo sshfs $USER@Node:/media/usbdrive /media/remote -o allow_other

That's about it. To unmount do
sudo umount /media/remote

and
sudo umount /media/usbdrive

respectively.