19 June 2013

458. Briefly: Converting GRAMS ASP ascii data to two-column ascii data

We have a couple of CARY 630 FT-IR /ATR instruments.

I hate them. Apart from being the Mac equivalent of spectrometers (if you try to do anything remotely creative you'll have a bad day. Point and click works well, most of the time), they aren't able to output data in any reasonable format.

At least not the way I'd define 'reasonable' i.e. simple x-y ascii data file and/or JCAMP-DX and/or even .csv. The default output is a binary .a2r file.

The only ascii-type format is a proprietary GRAMS ASP ascii file, for which I haven't been able to get the formal specs. Using google it seems as if the German arm of agilent did publish it, but when clicking on the links I'm told the file no longer exists, and google cache isn't playing ball.

Anyway. Luckily the format seems pretty simple.

Here are the first ten lines of an .asp file;
1798 4000.41016197344 650.579285428114 1 128 4 98.4862110783457 98.4183476284596 98.4587565715995 98.5660576694946
* The first line is the number of acquired data points
* The second line is the highest reciprocal wavelength in cm-1.
* The third line is the lowest reciprocal wavelength in cm_1.
* I don't know what the fourth and fifth lines signify. It could be dynamic resolution in the Y axis.
* The sixth line is the native resolution, i.e. 4 cm-1/data point. However, the data seems to be zero-filled, i.e. it seems the resolution is really ca 1.86 cm-1/pt.
Knowing the above, we can write a simple python script, which we'll call asp2asc, which will allow us to generate files suitable for gnuplot.
Example usage:
./asp2asc -i data.asp -o data.dat


asp2asc:
#!/usr/bin/python
#converts GRAMS ascii (asp) output from an CARY 630 FT-ATR-IR to a two-column ascii dat file
import sys

def getvars(arguments):
 exit=0
 ver=0.1
 try: 
  if "-o" in arguments:
   theoutput=arguments[arguments.index('-o')+1]
   print 'Output: %s.'%theoutput
  elif "--output" in arguments:
   theoutput=arguments[arguments.index('--output')+1]
   print 'Output: %s.'%theoutput
  else:
   print ''
   print 'Error -- no output file defined.'
   print ''
   arguments="--help"
 except:
  arguments="--help"

 try: 
  if "-i" in arguments:
   theinput=arguments[arguments.index('-i')+1]
   print 'Input: %s.'%theinput
  elif "--input" in arguments:
   theinput=arguments[arguments.index('--input')+1]
   print 'Input: %s.'%theinput
  else:
   print ''
   print 'Error -- no input file defined.'
   print ''
   arguments="--help"
 except:
  arguments="--help"

 try:
  if ("-h" in arguments) or ("--help" in arguments):
   print " "
   print "\t\tThis is asp2asc, a tool for generating converting"
   print "\t\tGRAMS ASP ascii files to two-column ascii files"
   print "\t\tThis is version",ver
   print "\tUsage:"
   print "\t-h\t--help   \tYou're looking at it."
   print "\t-i\t--input \tInput file, e.g. data.asp"
   print "\t-o\t--output \tOutput file, e.g. data.dat"
   print ""
   exit=1
 except:
  a=1   #do nothing
 
 if exit==1:
  sys.exit(0)
 print ''

 switches={'i':theinput,'o':theoutput}
 return switches

def getparams(datafile):
 params=[]
 n=1
 for line in datafile:
  try:
   params+=[int(line.rstrip('\n'))] 
  except:
   params+=[float(line.rstrip('\n'))] 
  if n==6:
   break
  n+=1 
 return params
 
def getydata(datafile):
 ydata=[]
 for line in datafile:
  ydata+=[float(line.rstrip('\n'))]
  
 return ydata
 
 
def makexdata(xpts,xmax,increment):
 n=0
 xdata=[]
 while n < xpts:
  xdata+=[xmax-n*increment]
  n+=1
 return xdata

def writexydata(outfile,xdata,ydata):
 for n in range(0,len(xdata)):
  outfile.write(str(xdata[n])+'\t'+str(ydata[n])+'\n')
 return 0

if __name__ == "__main__":
 arguments=sys.argv[1:len(sys.argv)]

 switches=getvars(arguments)
 infile=open(switches['i'],'r')
 
 params=getparams(infile) 
 ydata=getydata(infile) # needs getparams to have parked file reading at the 7th line 

 infile.close()

 xdata=makexdata(params[0],params[1],(params[1]-params[2])/(params[0]-1))

 if len(xdata)==len(ydata):
  outfile=open(switches['o'],'w')
  success=writexydata(outfile,xdata,ydata)
  outfile.close()  
 else:
  print 'Something bad happened:'
  print 'Number of X data points not equal to number of Y data points'
  print 'x pts: %i, y pts: %i'%(len(xdata),len(ydata))

Of course you could do this easily in a spreadsheet too, but I honestly find myself avoiding spreadsheet programmes like the plague ever since I learned how to use sed, gawk, and python.
Also, WHY do they make it so unnecessarily difficult to export your own data?

457. Very Briefly: Microsoft has a Tor exit node?

Whenever I play around with Tor I use ipchicken.com or whatsmyip.org to make sure that I'm indeed using a proxy. I also normally do a whois on the IP address, so see who's running the exit node.

Today I ended up with the IP address 168.61.8.22.

whois 168.61.8.22
NetRange: 168.61.0.0 - 168.63.255.255 CIDR: 168.62.0.0/15, 168.61.0.0/16 OriginAS: NetName: MSFT-EP NetHandle: NET-168-61-0-0-1 Parent: NET-168-0-0-0-0 NetType: Direct Assignment RegDate: 2011-06-22 Updated: 2012-10-16 Ref: http://whois.arin.net/rest/net/NET-168-61-0-0-1 OrgName: Microsoft Corp OrgId: MSFT-Z Address: One Microsoft Way City: Redmond StateProv: WA PostalCode: 98052 Country: US RegDate: 2011-06-22 Updated: 2013-04-12 Ref: http://whois.arin.net/rest/org/MSFT-Z OrgTechHandle: MSFTP-ARIN OrgTechName: MSFT-POC OrgTechPhone: +1-425-882-8080 OrgTechEmail: iprrms@microsoft.com OrgTechRef: http://whois.arin.net/rest/poc/MSFTP-ARIN OrgAbuseHandle: HOTMA-ARIN OrgAbuseName: Hotmail Abuse OrgAbusePhone: +1-425-882-8080 OrgAbuseEmail: abuse@hotmail.com OrgAbuseRef: http://whois.arin.net/rest/poc/HOTMA-ARIN OrgAbuseHandle: MSNAB-ARIN OrgAbuseName: MSN ABUSE OrgAbusePhone: +1-425-882-8080 OrgAbuseEmail: abuse@msn.com OrgAbuseRef: http://whois.arin.net/rest/poc/MSNAB-ARIN OrgNOCHandle: ZM23-ARIN OrgNOCName: Microsoft Corporation OrgNOCPhone: +1-425-882-8080 OrgNOCEmail: noc@microsoft.com OrgNOCRef: http://whois.arin.net/rest/poc/ZM23-ARIN OrgAbuseHandle: ABUSE231-ARIN OrgAbuseName: Abuse OrgAbusePhone: +1-425-882-8080 OrgAbuseEmail: abuse@microsoft.com OrgAbuseRef: http://whois.arin.net/rest/poc/ABUSE231-ARIN
That Microsoft is listed as the organisation doesn't necessarily mean that they are running the node (could be a hosting company) but it still seems that this might actually be MS running this one. Maybe it's just for research purposes, but it still seemed a bit surprising.

Microsoft as a company isn't exactly known for doing things out of the goodness of their hearts. Oh well.

17 June 2013

456. Adding NWChem basis sets to ECCE. Part 2. A solution: nwchem2ecce.py

UPDATED!

I've moved the finished scripts to here:
https://sourceforge.net/projects/nwbas2ecce/

They work! I've also added a number of converted basis sets to the sourceforge repo under 'examples'. You'll also find example ecp and ECPOrbital files.

Phew...

Here's the README:
The programmes are not 'intelligent' -- they won't check that you are doing something reasonable. Bad input = bad output. __Installation__: Download eccepag and nwbas2ecce They are both python (2.7) programmes, so you will need to install python to run them. On linux, this is normally very easy. E.g. on debian, run 'sudo apt-get install python2.7' and you are done. If you want, you can put the files in /usr/local/bin and do 'sudo chmod +x /usr/local/bin/eccepage' 'sudo chmod +x /usr/local/bin/nwbas2ecce' and you will be able to call the scripts from any directory. __Usage__ nwbas2ecce can turn a full basis set, or a, ECP basis set, into an ECCE compatible set of basis set files. Typically, an nwchem basis set consists of a single file, e.g. 3-21g. It can also be divided into several files, e.g. def2-svp and def-ecp, where the effective core potentials (ecps) are in def2-ecp. Other basis set files, like lanl2dz_ecp, contains both the orbital and the contraction parts. Typically, a ECCE basis set suite consists of: basis.BAS basis.BAS.meta basis.POT (for ECP) basis.POT.meta (for ECP) Sometimes polarization and diffuse functions are separated from the main .BAS file. E.g. 3-21++G* consists of 3-21G.BAS 3-21GS.BAS POPLDIFF.BAS , in addition to the meta files. The meta files are just markup-language type files with e.g. references. Note that you don't HAVE to break up the basis set components like this. Since the basis set data can be broken up into smaller files, the overall basis set is defined as an entry in a category file. For example, 3-21G is defined in the category file 'pople', and points to 3-21G.BAS. 3-21G* is also defined in pople, but point to both 3-21G.BAS and 3-21GS.BAS. ECP works in a similar way, by combining a .BAS and a .POT file. Note that the .POT files look different from the .BAS files. nwbas2ecce generates .BAS and .POT files based on whether there are basis/end or ecp/end sections in the nwchem basis set file. If there are both, both POT and BAS files are generated. All these files are contained in server/data/Ecce/system/GaussianBasisSetLibrary Finally, you need to generate .pag and .dir files that go into the server/data/Ecce/system/GaussianBasisSetLibrary/.DAV directory. The .dir file is always empty, while the .pag file is unfortunately a binary file. eccepag can, however, generate it with the right input. See e.g. http://verahill.blogspot.com.au/2013/06/455-adding-nwchem-basis-sets-to-ecce.html for more detailed information __Example__ We'll use def2-svp as an example. The nwchem basis set file def2-svp contains the basis set, while def2-ecp contains the core potentials. Use def2-svp to generate DEF2_SVP.BAS, DEF2_SVP.BAS.meta. Use def2-ecp to generate DEF2_ECP.POT, DEF2_ECP.POT.meta. As part of the generation, .descriptor files are also generated. These contain information that should go into the category file(s). Then generate the .pag files for both the POT and the BAS files, and touch the .dir files into existence. Do like this: nwbas2ecce -i def2-svp -o DEF2_SVP.BAS -n 'def2-svp' nwbas2ecce -i def2-ecp -p DEF2_ECP.POT -n 'def2-ecp' eccepag -n def2-svp -t ECPOrbital -c ORBITAL -y Segmented -s Y -o DEF2_SVP.BAS.pag eccepag -n def2-ecp -t ecp -c AUXILIARY -o DEF2_ECP.POT.pag NOTE: I don't actually know if def2-svp is segmented, and spherical. I don't think it matters for the .pag file generation. Also note that most inputs are case sensitive. Look at a similar .pag file for hints. You now have the following files: DEF2_ECP.POT DEF2_ECP.POT.descriptor DEF2_ECP.POT.meta DEF2_ECP.POT.pag DEF2_SVP.BAS DEF2_SVP.BAS.descriptor DEF2_SVP.BAS.meta DEF2_SVP.BAS.pag Copy the files. Note that you need to select the correct target directory, and that will vary with where you installed ECCE. I'll assume it's in /opt/ecce cp DEF2* /opt/ecce/server/data/Ecce/system/GaussianBasisSetLibrary cd /opt/ecce/server/data/Ecce/system/GaussianBasisSetLibrary mv *.pag .DAV/ touch .DAV/DEF2_SVP.BAS.dir .DAV/DEF2_ECP.POT.dir cat DEF2_SVP.BAS.descriptor >> ECPOrbital cat DEF2_ECP.POT.descriptor >> ECPOrbital cat DEF2_ECP.POT.descriptor >> ecp Edit ECPOrbital so that it reads: name= def2-svp files= DEF2_SVP.BAS DEF2_ECP.POT atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn atoms= Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn
/pre>