23 July 2013

483. MS data, part III: generating a matrix by combining several spectra, and plotting it in gnuplot

This post is primarily intended for two particular students. However, the problem it addresses is something that a lot of spectrometrists/scopists who want to take their data presentation to the next level have encountered.

My presumption:
You're running linux.

You've already exported your data as csv files as shown in this post: http://verahill.blogspot.com.au/2013/07/474-exporting-data-from-wsearch32-and.html

In addition, for the specifics in the commands below I will presume that this data is based on a cone voltage sweep from 0 to 300 in 10 volt steps. I thus have a series of files named: 0.csv, 10.csv, 20.csv..190,300.csv.

You should be able to easily customize the approach to e.g. time or concentration dependent data.

Let's get started:
0. Pre-reqs
Make sure you have gawk, sed, xargs, gnuplot, paste, python installed. On debian do
sudo apt-get install gawk sed xargs gnuplot paste python

1. Convert the csv files to dat files
Create the following script and call it csv2dat.sh
#!/bin/bash for e in 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 do tail -n +8 $e.csv | sed 's/\,/\t/g'| gawk '{print $2,$4}' > $e.dat done
and run it
sh csv2dat.sh

If all went well, you'll have a series of tab-separated .dat files which contain the m/z and the relative abundance (not absolute).

2. Extract ALL m/z values from all files
Create a file called homogenize.sh and put the following in it:
#!/bin/bash for e in {0..200..10} {210..300..10} do cat $e.dat | gawk '{print $1}' echo "" done
We'll run the homogenize script (it does nothing of the sort though), and then use the unix tools unique and sort to get rid of all non-unique m/z values, and to sort them in reverse numerical order:
sh homogenize.sh > allmz.dat
uniq allmz.dat temp.dat
sort -gr temp.dat > mz.dat

3. Pad the data with zeroes
Create a file called makelist.py, and put the following in it. Watch out for tab lengths etc. It's written for python 2.x, and probably won't work under python 3. It was also hacked together from an earlier script which didn't quite work the way I hoped it would.

#!/usr/bin/env python
import sys
from numpy import linspace
infile=sys.argv[1]

f=open(infile,'r')
arr=[]

print "Read %s" %infile
for line in f:
 line=line.rstrip('\n')
 try:
  arr+=[round(float(line),3)]
 except:
  pass
  #print line
f.close

mylist=arr
mylist.sort(reverse=True)

print "Calculating spacing"
spacing=1.0
old=max(mylist)
for i in range(0,len(mylist)):
 if round(abs(old-mylist[i]),3)<spacing and not (abs(old-mylist[i])==0):
  spacing=round(abs(old-mylist[i]),3)  
 old=mylist[i]

values=1+(max(mylist)-min(mylist))/spacing
print "Max, min, resolution: ",max(mylist),min(mylist),spacing
completelist=linspace(max(mylist),min(mylist),values).tolist()
mylist=completelist

voltages=[0,10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,210,220,230,240,250,260,270,280,290,300]
myys=[0]*len(mylist)

for n in voltages:
 print "voltage: ",n,'\n'
 f=open(str(n)+'.dat','r')
 g=open(str(n)+'pad.dat','w')
 arrx=[]
 arry=[]

 for line in f:
  line=line.rstrip('\n')
  line=line.split(' ')
  try:
   line[0]=round(float(line[0]),3)
   line[1]=float(line[1])
   arrx+=[line[0]]
   arry+=[line[1]]
  except:
   pass

 for i in range(0,len(arrx)-1):
  try:
   myys[mylist.index(arrx[i])]=arry[i]
  except:
   a=0   

 for i in range(0,len(myys)-1):
  g.write(str(myys[i])+'\n')
f.close
g.close

h=open('mz.x','w')

for i in range(0,len(mylist)-1):
 h.write(str(mylist[i])+'\n')
h.close

Run
python makelist.py allmz.dat

Getting 'fail' messages is ok -- most likely it's due to an empty line. You can check that everything worked out by doing e.g.
wc 0pad.dat 220pad.dat

The numbers in the first column should be the same if the files have the same number of lines.

4. Make a matrix
Paste all the ms data side-by-side.
paste 0pad.dat 10pad.dat 20pad.dat 30pad.dat 40pad.dat 50pad.dat 60pad.dat 70pad.dat 80pad.dat 90pad.dat 100pad.dat 110pad.dat 120pad.dat 130pad.dat 140pad.dat 150pad.dat 160pad.dat 170pad.dat 180pad.dat 190pad.dat 200pad.dat 210pad.dat 220pad.dat 230pad.dat 240pad.dat 250pad.dat 260pad.dat 270pad.dat 280pad.dat 290pad.dat 300pad.dat > allpad.dat

5. Rotate the matrix
Create a script called rotate.sh:
gawk ' { for (i=1; i<=NF; i++) { a[NR,i] = $i } } NF>p { p = NF } END { for(j=1; j<=p; j++) { str=a[1,j] for(i=2; i<=NR; i++){ str=str" "a[i,j]; } print str } }' $1
and run
sh rotate.sh allpad.dat > matrix.rot.dat

6. Plot using gnuplot
See the following script for an example. Note that plotting in gnuplot using 'matrix' you don't get the benefit of proper axes labels. Instead we do a bit of on-the-fly maths to get the axes right. Specifically:
using (2999.3-(($1-1)/10)):(($2-1)*10):($3)

means that for the m/z axes ($1) we take the highest value (in our case 2999.3) and remove 0.1 m/z (our resolution) for each data point. This data is in each row. For the CV axes ($2), which goes down the columns in our matrix.rot.dat, we have thirty values. Each one corresponds to an increase in 10V starting at 0V, hence we multiply by 10. $3 is the intensity, which we don't need to fiddle with.

Save the following as cntr.gplt
set term png size 1000,1000 set output 'map.png' set zrange [-10:110] set yrange [0:300] unset surface set contour base set cntrparam levels 15 set view 0,0 unset ztics unset key splot 'matrix.rot.dat' matrix using (2999.3-($1/10)):($2*10):($3) with lines palette

Running
gnuplot cntr.gplt

VERY SLOWLY (in my case I had 0.9 M data points) gives us

If you're confused as to why the data doesn't go beyond 250 Volt (y axis) it's because I made a mistake at one point.

Changing the ranges a bit we get
And even more zoomed in:

Soon to come as a separate post:
 the same data, but as a stacked plot. Here's what it looks like though:

482. kernel 3.10.2 with CK patch

NOTE: the 304.88 nvidia kernel modules DO NOT BUILD on this kernel. I've also tried 3.10.5 and it also does not work.

NOTE II: I'm getting random slowdowns on my SL410 laptop with intel graphics. Not sure if it's the same issue as this: http://verahill.blogspot.com.au/2013/03/368-slow-mouse-and-keyboard-triggered.html
Once kworker shows up in top everything grinds to a slow crawl.

Nothing odd here. For a list of what questions to expect when going from 3.9 to 3.10, see e.g. http://verahill.blogspot.com.au/2013/07/468-kernel-310-on-debian.html

The CK patch set supposedly improves desktop performance of the kernel. As it seems like Con doesn't update that page anymore, go directly to the patches: http://ck.kolivas.org/patches/3.0/

sudo apt-get install xz-utils kernel-package fakeroot ncurses-dev
mkdir ~/tmp
cd ~/tmp
wget https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.10.2.tar.xz
tar xvf linux-3.10.2.tar.xz
cd linux-3.10.2/
wget http://ck.kolivas.org/patches/3.0/3.10/3.10-ck1/patch-3.10-ck1.bz2
bunzip2 patch-3.10-ck1.bz2
patch -p1 < patch-3.10-ck1
patching file arch/powerpc/platforms/cell/spufs/sched.c patching file Documentation/scheduler/sched-BFS.txt patching file Documentation/sysctl/kernel.txt patching file fs/proc/base.c patching file include/linux/init_task.h patching file include/linux/ioprio.h patching file include/linux/sched.h patching file init/Kconfig patching file init/main.c patching file kernel/delayacct.c patching file kernel/exit.c patching file kernel/posix-cpu-timers.c patching file kernel/sysctl.c patching file lib/Kconfig.debug patching file include/linux/jiffies.h patching file drivers/cpufreq/cpufreq.c patching file drivers/cpufreq/cpufreq_ondemand.c patching file kernel/sched/bfs.c patching file include/uapi/linux/sched.h patching file include/linux/sched/rt.h patching file kernel/stop_machine.c patching file drivers/cpufreq/cpufreq_conservative.c patching file kernel/sched/Makefile patching file kernel/time/Kconfig patching file kernel/Kconfig.preempt patching file kernel/Kconfig.hz patching file arch/x86/Kconfig patching file Makefile
make-kpkg clean cat /boot/config-`uname -r`>.config make oldconfig time fakeroot make-kpkg -j3 --initrd kernel_image kernel_headers sudo dpkg -i ../*3.10.2-ck*.deb sudo rm /lib/modules/3.10.2-ck1/build sudo ln -s /usr/src/linux-headers-3.10.2-ck1/ /lib/modules/3.10.2-ck1/build sudo dkms autoinstall -k 3.10.2-ck1

22 July 2013

481. A little bit of samba on the command line

I have a bit of a problem with samba currently.

My problem is that my computers are sitting behind a router (on a 192.168.2.0/24 subnet) and the computers that I want to access sit on the university network, to which the router is connected. The address range is, say, 131.172.x.x.

In other words, I (think I) want to use samba across two subnets.

I've opened up ports 13-139,445 to tcp and udp on both the router and in iptables on my desktop.

My problem:
1. I can't see the network shares of the other computers using
   a) nautilus (Network/Windows Network)
   b) nmblookup
   c) sambascanner

2. I can't connect to network shares using their netbios names. For example, I'd like to connect to e.g. smb://avance400/data, but I have to use the IP address instead. For some curious reason not even that works using nautilus.

Workaround:
So here's not a solution, but a workaround.

I can connect to other computers from the command line as long as I know the IP address, and here's how
smbclient //131.172.123.30/data -U myuni/me

If you actually want to mount the share, which is password protected, and you do, then do
sudo mount -t cifs -o user=me //131.172.123.30/data /media/smbmounts/

where /media/smbmounts belong to you (e.g. sudo mkdir /media/smbmounts && sudo chown $USER /media/smbmounts).

And that's more or less it.

Some additional information:
If you don't get prompted for the password, and get
mount: block device //131.172.123.30/data is write-protected, mounting read-only
mount: cannot mount block device //131.172.123.30/data read-only

but supplying the password as part of the command line works, then you are missing cifs-utils, so install them.

Note that mount.cifs can handle credentials from a special file, e.g. like this , which you chmod to 600. My chief issue with that is that ~/.bash_history has exactly the same permissions (u+rw, go-rwx) and so I don't see how it's that's any safer than exposing everything by supplying your password as part of the mount command. Both should be avoided if possible.

On the other hand you could argue that since the password is transmitted over the network in cleartext you're inviting trouble either way...