27 September 2012

247. Setting up Openchrom (and using it to open Agilent .D ESI-MS files on Linux)

I've been using Wsearch (http://www.wsearch.com.au/wsearch32/wsearch32.htm) to process agilent chemstation ESI-MS spectra for the past few years. It and Matt Monroe's Molecular Weight calculator (why, oh why is there no comparable molecular weight calculator for linux?) have been the only two reasons why I've bother with Wine under Linux. Openchrom is written in java and so will run on both Good (Linux) and Evil (OS X and Win) operating systems.

Having finally discovered openchrom (v 0.6 so still early days) I can now finally retire wsearch from my own computers (it's still a good piece of software, but it's crippled to encourage the purchasing of a 'full' version, and I've had no luck purchasing a license from the author in spite of having tried several times during the past couple of years). OpenChrom can export an entire agilent experiment as a '3D' csv file which makes processing a lot more fun.

As an aside: I hate proprietary file formats since they prevent me from using my own tools (cat, sed, gawk, gnuplot, octave) when processing -- or at a minimum make it more difficult. Most universities and grant agencies now add a provision regarding data management in their grant acceptance agreements/work conduct policies. In general these provisions state that the data shall be made available publicly and /or managed by a university repository. What is REALLY missing is a clause about using open formats -- and that should be taken into account when acquiring new instrumentation. All else being equal, an instrument which is 'open' will be a lot cheaper to manage in the long run since you won't have to feel locked in in terms of software. That's incidentally a reason why I like Metrohm since they provide details of their RS-232 interface allowing you to write your own software.

Anyway, here's how to get set up:


1. Install Java v1.7 (need > 1.6)
You can either use openjdk 7 or (Oracle) Java. See here for a general guide to installing Oracle/Sun Java.

As for openjdk, you can easily install it:
sudo apt-get install openjdk-7-jdk

(the openjdk-7-jre package is enough if you don't want the full developer's kit)

Anyway.

Make sure that you've selected the right version:
 sudo update-alternatives --config java
There are 7 choices for the alternative java (providing /usr/bin/java).

  Selection    Path                                            Priority   Status
------------------------------------------------------------
  0            /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java   1061      auto mode
  1            /usr/bin/gij-4.4                                 1044      manual mode
  2            /usr/bin/gij-4.6                                 1046      manual mode
  3            /usr/bin/gij-4.7                                 1047      manual mode
  4            /usr/lib/jvm/j2re1.6-oracle/bin/java             314       manual mode
  5            /usr/lib/jvm/j2sdk1.6-oracle/jre/bin/java        315       manual mode
  6            /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java   1061      manual mode
 *7            /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java   1051      manual mode



2. Get openchrom
cd ~/tmp
wget http://sourceforge.net/projects/openchrom/files/REL-0.6.0/openchrom_linux.gtk.x86_64_0.6.0.zip
unzip openchrom_linux.gtk.x86_64_0.6.0.zip
cd linux.gtk.x86_64/OpenChrom/
sudo mkdir /opt/openchrom
sudo chown $USER /opt/openchrom 
cp * -R /opt/openchrom
chmod +x /opt/openchrom/openchrom

Stick

alias openchrom='/opt/openchrom/openchrom'

in your ~/.bashrc and source it.




3. Get plugins
On first boot you're asked whether you want to get additional plugins using the 'Openchrom marketplace'. Since I'm mainly processing data from an Agilent ESI-MS, I wanted the plugin for Agilent files. The website says that you need a license key for plugins BUT that it's free to register for one.

This is a 30-days trial version. Afterwards, you need a valid serial key.
You can get a free serial key after registration on http://www.openchrom.net.
You can use the converter for commercial or non-commercial purposes free of charge, but you are not allowed to redistribute this software without my permission.

Note, that clicking on links on the website didn't lead me to a link to download the plugin. Instead, in OpenChrom click on the Plug-ins menu:






As always, make sure you trust your suppliers.


And then you're done installing.

There's nothing odd about registering other than this: you will receive an email with a confirmation of your registration in clear text WITH YOUR PASSWORD. So...be aware of that.


4. You can now browse in the tree to the left and select your .D folder:

There's a bit of clever thinking when it comes to the functionality of the program. The upside of this I think will eventually be that it's easy to get a consistent experience for a set of users (not unimportant for a research group). The downside is that it's a bit clunky getting started. Play with it for an hour and you'll get the hang of it, so it's not really that much of a hurdle. Also, too many options seem to be context sensitive -- I am having real trouble finding various options under the 'Accurate' perspective which I can find under the 'default' perspective.


5. Some comments:

It's still early days for OpenChrom (v 0.6) , and there are a few minor issues which may or may not affect you:

* Registration keys. They are easy enough to get (register online, log in, click on the plug in online that you want and you'll see the key), but if you have installed a new plug in and open your first spectrum right after that you'll be asked for registration keys. It won't tell you for which plug in the dialogue you're seeing is though, so if you've just installed three different plug ins you'll have to do some trial-and-error. This is fixed in the upcoming version.

* Raw/gaussian plot of mass spectrum. This took a while to figure out, but you have to use perspectives. The default (heavily zoomed in) view looks like this:
This might be good enough for those organic types...us 108 element inorganic types want more detail
If you go to the top right corner, click on 'other' (perspectives)
and select accurate you get
Bingo!
And then there's the obligatory wish list:

* A good quality isotopic pattern calculator would be nice. Anyone who has compared the output from different pieces of software will have discovered that different calculators may yield very different patterns. I think some of it boils down to truncation rather than incorrect isotopic ratios, but that just highlights how difficult it can be to implement a seemingly simple concept. The only calculator which I trust AND find useful is Matt Monroe's calculator -- the predicted patterns look good, and you get proper Gaussian broadening which means that it looks 'right'. This would be perfect as a plug-in. If only I knew how to properly implement it...

* A good quality ion generator -- some pieces of software (Hi Matt) allow you to select a handful of elements or fragments, pick a range of charges, input an m/z value and based on that spits out a list over possible identities for your signal. It's a good thing to have by your side the first time you look at a complex mixture trying to figure out what products may be present. This would be perfect as a plug-in. I've written this type of programmes before, but in python using for-loops...a vectorized version should be faster and maybe even easier to write.

25 September 2012

246. Cluster network performance testing (very basic) on Debian Testing using a gigabit switch

Playing with hpcc got me thinking about my network connection.

My cluster looks like this:
I've got four nodes which are connected via two networks, 192.168.2.0/24 and 192.168.1.0/24. The 192.168.1.0/24 network is connected using a gigabit switch. Be (see below) acts as the gateway. The 192.168.2.0/24 network is connected via a crappy old netgear 10/100 router (dhcp) and provides access to the outside world (hello mac spoofing :) ). Each box shares a folder via nfs using a unique name.
_Nodes_
Be: AMD II X3, 8 GB ram (192.168.1.1): Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
Ta: Intel i5-2400, 8 GB ram (192.168.1.150):  Intel Corporation 82579LM Gigabit Network Connection (rev 04)
B: AMD Phenom II X6, 8 GB ram (192.168.1.101): Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
Ne: AMD FX 8150 X8, 16 GB ram (192.168.1.120): Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)

So, time to test the network performance:
sudo apt-get install iperf

On all your boxes (e.g. using clusterssh) start the iperf daemon
iperf -s

Then on each of your nodes run:
iperf -c 192.168.1.1 && iperf -c 192.168.1.101 && iperf -c 192.168.1.150 && iperf -c 192.168.1.120

------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 45.7 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.101 port 37893 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   564 MBytes   473 Mbits/sec
------------------------------------------------------------
Client connecting to 192.168.1.101, TCP port 5001
TCP window size:  169 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.101 port 35926 connected with 192.168.1.101 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  15.5 GBytes  13.3 Gbits/sec
------------------------------------------------------------
Client connecting to 192.168.1.150, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.101 port 48257 connected with 192.168.1.150 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   564 MBytes   473 Mbits/sec
------------------------------------------------------------
Client connecting to 192.168.1.120, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.101 port 43236 connected with 192.168.1.120 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   617 MBytes   517 Mbits/sec


Overall, this is what I got
Client/Server (MBit/s)
     Be     B     Ta    Ne
Be   13.7G  310   308   316
B    564    15.5G 564   617
Ta   726    660   19.7G 936
Ne   882    484   917   19.4G

I'm not sure whether to expect a metric gigabit (1000 metric MBit) or a binary one (1024 binary MBit), but looking at our results our best is 936 Mbit/s and worst 308 Mbit/s. All of them should thus ideally reach at least 936 MBit/s. They all have gigabit network card.

And now, try to improve it:
I went through the whole shebang with
sudo ifconfig eth1 mtu 9000
sudo ifconfig eth1 mtu 8000
etc.
Anyway, I got the following MTUs that way:
Be  7100
B    7100
Ne  9000
Ta   9000

I then set the MTUs to 7100 on all the nodes and tried pinging from node to node, e.g.:
ping -s 7072 -M do 192.168.1.101

Well, that maxed out at 1472 i.e. about MTU 1500 which was the original value. So I'm a bit confused.


Settings:
Be:
eth1      Link encap:Ethernet  HWaddr 00:f0:4d:83:0a:48  
          inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::2f0:4dff:fe83:a48/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:24124966 errors:0 dropped:27064 overruns:0 frame:0
          TX packets:19569426 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:25859945667 (24.0 GiB)  TX bytes:14200267703 (13.2 GiB)
B:
eth1      Link encap:Ethernet  HWaddr 02:00:8c:50:2f:6b  
          inet addr:192.168.1.101  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::8cff:fe50:2f6b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:14540970 errors:0 dropped:36651 overruns:0 frame:0
          TX packets:16801915 errors:0 dropped:2 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:12347398135 (11.4 GiB)  TX bytes:18008416370 (16.7 GiB)
Ta:
eth1      Link encap:Ethernet  HWaddr 78:2b:cb:b3:a4:b7  
          inet addr:192.168.1.150  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::7a2b:cbff:feb3:a4b7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:14717233 errors:0 dropped:68232 overruns:0 frame:0
          TX packets:17769966 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:13860096243 (12.9 GiB)  TX bytes:20207270880 (18.8 GiB)
          Interrupt:20 Memory:e1a00000-e1a20000 
Ne:
eth1      Link encap:Ethernet  HWaddr 90:2b:34:93:75:e6  
          inet addr:192.168.1.120  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::922b:34ff:fe93:75e6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13567520 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10710054 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:13086635236 (12.1 GiB)  TX bytes:12381041605 (11.5 GiB)

245. Recompile debian's hpcc with other libs

I installed hpcc using apt-get, but -- and this is a first -- when trying to run it complained over missing libs.



Why compile?

hpcc
hpcc: error while loading shared libraries: libatlas.so.3gf: cannot open shared object file: No such file or directory
Doing
aptitude show hpcc 
Depends: libatlas3gf-base, libc6 (>= 2.7), libopenmpi1.3, mpi-default-bin
apt-cache search libatlas.so.3gf

libatlas3-base - Automatically Tuned Linear Algebra Software, generic shared
libatlas3gf-base - Transitional package to libatlas3-base
and doing
 aptitude search atlas|grep ^i


i   libatlas-dev                    - Automatically Tuned Linear Algebra Softwar
i A libatlas3gf-base                - Transitional package to libatlas3-base
but
locate libatlas.so.3gf
comes up empty.

So build your own:
sudo mkdir /opt/hpcc
sudo chown $USER /opt/hpcc
cd /opt/hpcc
wget http://ftp.de.debian.org/debian/pool/main/h/hpcc/hpcc_1.4.1.orig.tar.gz
tar xvf hpcc_1.4.1.orig.tar.gz
cd hpcc-1.4.1/
wget http://ftp.de.debian.org/debian/pool/main/h/hpcc/hpcc_1.4.1-2.debian.tar.gz
tar xvf hpcc_1.4.1-2.debian.tar.gz
patch -i debian/patches/add-Make.Debian.patch

Edit Make.Debian. For some reason LAdir is ignored, hence the -L option in LAlib
 78 # ----------------------------------------------------------------------
 79 # - MPI directories - library ------------------------------------------
 80 # ----------------------------------------------------------------------
 81 # MPinc tells the  C  compiler where to find the Message Passing library
 82 # header files,  MPlib  is defined  to be the name of  the library to be
 83 # used. The variable MPdir is only used for defining MPinc and MPlib.
 84 #
 85 MPdir        =/usr/lib/openmpi/lib/
 86 MPinc        =
 87 MPlib        =-lmpi
 88 #
 89 # ----------------------------------------------------------------------
 90 # - Linear Algebra library (BLAS or VSIPL) -----------------------------
 91 # ----------------------------------------------------------------------
 92 # LAinc tells the  C  compiler where to find the Linear Algebra  library
 93 # header files,  LAlib  is defined  to be the name of  the library to be
 94 # used. The variable LAdir is only used for defining LAinc and LAlib.
 95 #
 96 LAdir        = /opt/ATLAS/lib
 97 LAinc        =
 98 LAlib        = -L/opt/ATLAS/lib -ltatlas
 99 #

The above assumes that you've compiled your own openblas as shown elsewhere on this blog. You can use whatever math libs you want. Again, there are a couple described on this blog (acml, netlib blas/lapack, openblas, ATLAS). I've had success with the netlib blas/lapack and atlas (built with netlib lapack).

mv Make.Debian hpl/
make arch=Debian

Hopefully everything went well. Now you need an input file.
cp _hpccinf.txt hpccinf.txt

Edit hpccinf.txt:
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
1000         Ns
1            # of NBs
80           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
3            Ps
1            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                               Number of additional problem sizes for PTRANS
1200 10000 30000                values of N
0                               number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64        values of NB

Launch by doing
mpirun -n X ./hpcc
where X=Ps times Qs (e.g. 3 in the example above).

I put the hpccinf.txt in a shared (nfs) folder (~/jobs), created a file called myhost

tantalum slots=2 max_slots=4
boron slots=2 max_slots=6
neon slots=2 max_slots=8
 and then launched using
mpirun -n 4 -hostfile myhost /opt/hpcc/hpcc-1.4.1/./hpcc