25 September 2012

246. Cluster network performance testing (very basic) on Debian Testing using a gigabit switch

Playing with hpcc got me thinking about my network connection.

My cluster looks like this:
I've got four nodes which are connected via two networks, 192.168.2.0/24 and 192.168.1.0/24. The 192.168.1.0/24 network is connected using a gigabit switch. Be (see below) acts as the gateway. The 192.168.2.0/24 network is connected via a crappy old netgear 10/100 router (dhcp) and provides access to the outside world (hello mac spoofing :) ). Each box shares a folder via nfs using a unique name.
_Nodes_
Be: AMD II X3, 8 GB ram (192.168.1.1): Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
Ta: Intel i5-2400, 8 GB ram (192.168.1.150):  Intel Corporation 82579LM Gigabit Network Connection (rev 04)
B: AMD Phenom II X6, 8 GB ram (192.168.1.101): Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
Ne: AMD FX 8150 X8, 16 GB ram (192.168.1.120): Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)

So, time to test the network performance:
sudo apt-get install iperf

On all your boxes (e.g. using clusterssh) start the iperf daemon
iperf -s

Then on each of your nodes run:
iperf -c 192.168.1.1 && iperf -c 192.168.1.101 && iperf -c 192.168.1.150 && iperf -c 192.168.1.120

------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 45.7 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.101 port 37893 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   564 MBytes   473 Mbits/sec
------------------------------------------------------------
Client connecting to 192.168.1.101, TCP port 5001
TCP window size:  169 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.101 port 35926 connected with 192.168.1.101 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  15.5 GBytes  13.3 Gbits/sec
------------------------------------------------------------
Client connecting to 192.168.1.150, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.101 port 48257 connected with 192.168.1.150 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   564 MBytes   473 Mbits/sec
------------------------------------------------------------
Client connecting to 192.168.1.120, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.101 port 43236 connected with 192.168.1.120 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   617 MBytes   517 Mbits/sec


Overall, this is what I got
Client/Server (MBit/s)
     Be     B     Ta    Ne
Be   13.7G  310   308   316
B    564    15.5G 564   617
Ta   726    660   19.7G 936
Ne   882    484   917   19.4G

I'm not sure whether to expect a metric gigabit (1000 metric MBit) or a binary one (1024 binary MBit), but looking at our results our best is 936 Mbit/s and worst 308 Mbit/s. All of them should thus ideally reach at least 936 MBit/s. They all have gigabit network card.

And now, try to improve it:
I went through the whole shebang with
sudo ifconfig eth1 mtu 9000
sudo ifconfig eth1 mtu 8000
etc.
Anyway, I got the following MTUs that way:
Be  7100
B    7100
Ne  9000
Ta   9000

I then set the MTUs to 7100 on all the nodes and tried pinging from node to node, e.g.:
ping -s 7072 -M do 192.168.1.101

Well, that maxed out at 1472 i.e. about MTU 1500 which was the original value. So I'm a bit confused.


Settings:
Be:
eth1      Link encap:Ethernet  HWaddr 00:f0:4d:83:0a:48  
          inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::2f0:4dff:fe83:a48/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:24124966 errors:0 dropped:27064 overruns:0 frame:0
          TX packets:19569426 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:25859945667 (24.0 GiB)  TX bytes:14200267703 (13.2 GiB)
B:
eth1      Link encap:Ethernet  HWaddr 02:00:8c:50:2f:6b  
          inet addr:192.168.1.101  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::8cff:fe50:2f6b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:14540970 errors:0 dropped:36651 overruns:0 frame:0
          TX packets:16801915 errors:0 dropped:2 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:12347398135 (11.4 GiB)  TX bytes:18008416370 (16.7 GiB)
Ta:
eth1      Link encap:Ethernet  HWaddr 78:2b:cb:b3:a4:b7  
          inet addr:192.168.1.150  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::7a2b:cbff:feb3:a4b7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:14717233 errors:0 dropped:68232 overruns:0 frame:0
          TX packets:17769966 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:13860096243 (12.9 GiB)  TX bytes:20207270880 (18.8 GiB)
          Interrupt:20 Memory:e1a00000-e1a20000 
Ne:
eth1      Link encap:Ethernet  HWaddr 90:2b:34:93:75:e6  
          inet addr:192.168.1.120  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::922b:34ff:fe93:75e6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13567520 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10710054 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:13086635236 (12.1 GiB)  TX bytes:12381041605 (11.5 GiB)

245. Recompile debian's hpcc with other libs

I installed hpcc using apt-get, but -- and this is a first -- when trying to run it complained over missing libs.



Why compile?

hpcc
hpcc: error while loading shared libraries: libatlas.so.3gf: cannot open shared object file: No such file or directory
Doing
aptitude show hpcc 
Depends: libatlas3gf-base, libc6 (>= 2.7), libopenmpi1.3, mpi-default-bin
apt-cache search libatlas.so.3gf

libatlas3-base - Automatically Tuned Linear Algebra Software, generic shared
libatlas3gf-base - Transitional package to libatlas3-base
and doing
 aptitude search atlas|grep ^i


i   libatlas-dev                    - Automatically Tuned Linear Algebra Softwar
i A libatlas3gf-base                - Transitional package to libatlas3-base
but
locate libatlas.so.3gf
comes up empty.

So build your own:
sudo mkdir /opt/hpcc
sudo chown $USER /opt/hpcc
cd /opt/hpcc
wget http://ftp.de.debian.org/debian/pool/main/h/hpcc/hpcc_1.4.1.orig.tar.gz
tar xvf hpcc_1.4.1.orig.tar.gz
cd hpcc-1.4.1/
wget http://ftp.de.debian.org/debian/pool/main/h/hpcc/hpcc_1.4.1-2.debian.tar.gz
tar xvf hpcc_1.4.1-2.debian.tar.gz
patch -i debian/patches/add-Make.Debian.patch

Edit Make.Debian. For some reason LAdir is ignored, hence the -L option in LAlib
 78 # ----------------------------------------------------------------------
 79 # - MPI directories - library ------------------------------------------
 80 # ----------------------------------------------------------------------
 81 # MPinc tells the  C  compiler where to find the Message Passing library
 82 # header files,  MPlib  is defined  to be the name of  the library to be
 83 # used. The variable MPdir is only used for defining MPinc and MPlib.
 84 #
 85 MPdir        =/usr/lib/openmpi/lib/
 86 MPinc        =
 87 MPlib        =-lmpi
 88 #
 89 # ----------------------------------------------------------------------
 90 # - Linear Algebra library (BLAS or VSIPL) -----------------------------
 91 # ----------------------------------------------------------------------
 92 # LAinc tells the  C  compiler where to find the Linear Algebra  library
 93 # header files,  LAlib  is defined  to be the name of  the library to be
 94 # used. The variable LAdir is only used for defining LAinc and LAlib.
 95 #
 96 LAdir        = /opt/ATLAS/lib
 97 LAinc        =
 98 LAlib        = -L/opt/ATLAS/lib -ltatlas
 99 #

The above assumes that you've compiled your own openblas as shown elsewhere on this blog. You can use whatever math libs you want. Again, there are a couple described on this blog (acml, netlib blas/lapack, openblas, ATLAS). I've had success with the netlib blas/lapack and atlas (built with netlib lapack).

mv Make.Debian hpl/
make arch=Debian

Hopefully everything went well. Now you need an input file.
cp _hpccinf.txt hpccinf.txt

Edit hpccinf.txt:
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
1000         Ns
1            # of NBs
80           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
3            Ps
1            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                               Number of additional problem sizes for PTRANS
1200 10000 30000                values of N
0                               number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64        values of NB

Launch by doing
mpirun -n X ./hpcc
where X=Ps times Qs (e.g. 3 in the example above).

I put the hpccinf.txt in a shared (nfs) folder (~/jobs), created a file called myhost

tantalum slots=2 max_slots=4
boron slots=2 max_slots=6
neon slots=2 max_slots=8
 and then launched using
mpirun -n 4 -hostfile myhost /opt/hpcc/hpcc-1.4.1/./hpcc

21 September 2012

244. Molden on debian testing

Update: avogadro can write gamess input files, but seems to offer little in the way of showing detailed output from gamess output files. Also, some of the input files contain keywords which don't exist.

Original post:
Nothing beats a good GUI, so after butting heads with gabedit again (and losing - again. Although in this case I think I tried to make it do something it wasn't designed to) I've decided to try Molden.

To download, go here, make sure to be a good citizen and register yourself as a user (will help motivate funding for development) then download: http://www.cmbi.ru.nl/molden/howtoget.html

cd ~/tmp
wget ftp://ftp.cmbi.ru.nl/pub/molgraph/molden/molden5.0.tar.gz
tar xvf molden5.0.tar.gz
cd molden5.0/

Edit makefile and remove -lXmu from line 20:

16 CC = cc
17 FC = gfortran
18 LIBS =  -lX11 -lm
19 LDR = ${FC} 
20 LIBSG = -L/usr/X11R6/lib -lGLU -lGL -lX11 -lm

cd surf/

edit Makefile and change it from

 46 depend: $(DEPEND)
 47     @ echo making dependencies...
 48     @ echo ' ' > makedep
 49     @ makedepend $(INCLUDE) -f makedep $(DEPEND)

to

 46 depend: $(DEPEND)
 47     @ echo making dependencies...
 48     @ echo ' ' > makedep
 49     @ $(CC) $(INCLUDE) -M $(DEPEND) > makedep

Save and go back up one level, and run make:
 cd ../
 make

You're pretty much done.

I like putting things in /opt, so
sudo mkdir /opt/molden
sudo chown $USER /opt/molden
cp ~/tmp/molden5.0/* -R /opt/molden

stick
export PATH=$PATH:/opt/molden
in your ~/.bashrc

Type
molden
to run

Molden can read output files from gamess -- still exploring the exact capabilities, but e.g the convergence information can be accessed:


and you can get nifty contour plots of the electron density of orbitals etc.


Error:

If you don't edit the surf/Makefile as shown above you'll get

make[1]: Leaving directory `/home/me/tmp/molden5.0/ambfor'
make -C surf depend
make[1]: Entering directory `/home/me/tmp/molden5.0/surf'
making dependencies...
make[1]: makedepend: Command not found
make[1]: *** [depend] Error 127
make[1]: Leaving directory `/home/me/tmp/molden5.0/surf'