28 June 2013

466. morph xyz -- python script to morph .xyz files

Rather naively I was hoping that by comparing two  molecule .xyz files and generating an average of them I would be able to conveniently generate a half-decent transition state guess.

Turns out that it's not quite as simple. However, I've written the software, so I might as well share it.

Note that it's written in python 2.7 (i.e. not python 3)

Run the script without arguments for help. General usage is
morphxyz -i 1.xyz 2.xyz -o morph.xyz

So here it is:


import sys

def getvars(arguments):

  if "-i" in arguments:
   print 'Input: %s and %s'% (switches['in_one'],switches['in_two'])
  if "-o" in arguments:
   print 'Output: %s'% switches['o']

  if "-w" in arguments:
   print 'Weighting: %i'% switches['w']
   print 'Assuming no weighting'

  if ("-h" in arguments) or ("--help" in arguments):
   print '\t\t bytes2words version %s' % version
   print '\t-i\t two xyz files to morph'
   print '\t-o\t output file'
   print '\t-w\t weight one structure vs the other (1=average; 0=start; 2=end)'
   print 'Exiting'
  a=0 # do nothing
 if doexit==1:

 return switches

def getcmpds(switches):
 for line in g:
  if n==1:
  elif n==2:
   line=line.split(' ')
 for line in g:
  if n==1:
  elif n==2:
   line=line.split(' ')
 return cmpds

def morph(cmpds):
 for n in range(0,cmpds['atoms_one']):
 return cmpds

def genxyzstring(coords,element):
 x_str='%10.5f'% coords[0]
 y_str='%10.5f'% coords[1]
 z_str='%10.5f'% coords[2]
 xyz_string=element+(3-len(element))*' '+10*' '+\
 (8-len(x_str))*' '+x_str+10*' '+(8-len(y_str))*' '+y_str+10*' '+(8-len(z_str))*' '+z_str+'\n'
 return xyz_string

def writemorph(cmpds,outfile):
 for n in range(0,cmpds['atoms_one']):
  g.write(genxyzstring(coords, cmpds['elements_one'][n]))
  h.write(genxyzstring(diffcoords, cmpds['elements_one'][n]))
 return 0

if __name__=="__main__":
 if cmpds['atoms_one']!=cmpds['atoms_two']:
  print 'The number of atoms differ. Exiting'
 elif cmpds['elements_one']!=cmpds['elements_two']:
  print 'The types of atoms differ. Exiting'
 if success==0:
  print 'Conversion seems successful'

27 June 2013

465. The Intel MKL (Math Kernel Library) on Linux (Debian) -- for free

I've been living under the impression that the Intel MKL weren't free.

In all fairness, since I'm using AMD almost exclusively and the ACML is free -- and OpenBlas has worked fine on my i5-2400-based node -- I haven't had enough motivation to really dig into this. However, as part of a post on compiling GAMESS US, Kirill Berezovsky mentions in passing that you can get both the Intel MKL and ifortran for free for non-commercial use.

To be fair, that's free in the Windows sense, not in the Linux sense. You've still got enough restrictions to make Stallman weep, but it's free enough that we have a fighting chance at evaluation the software -- and use it if it's good enough.

See here for ACML: http://verahill.blogspot.com.au/2013/05/422-set-up-acml-on-linux.html
See here for OpenBlas: http://verahill.blogspot.com.au/2013/05/423-openblas-on-debian-wheezy.html

Anyway, this is about MKL:

0. Register a request
Go to http://software.intel.com/en-us/non-commercial-software-development
Click on what you want to download -- in this case the Intel MKL. Fill out the form and hit submit. Note that you'll need to enable cookies for this to work.

[while I was at it I got the intel parallel studio xe as well -- that's 2 Gb though, so don't get it unless you want it and have a lot of bandwidth to spare]

1. Download
You'll get an email with a link. Click it:

The MKL file is 609 MB, which is a bit bigger than the ACML files which tend to be around 67-70 Mb (each).

Note: the email you get has a serial number. From what I understand it's valid for one year if you want to download updates and new releases. However, nowhere does it say that any installed software will expire, so I presume that you can continue using the MKL libraries indefinitely.

2. Install
In my case the file is called l_mkl_11.0.4.183.tgz and it was downloaded to ~/Downloads
mkdir ~/tmp
cd ~/tmp
cp ~/Downloads/l_mkl_11.0.4.183.tgz .
tar xvf l_mkl_11.0.4.183.tgz
cd l_mkl_11.0.4.183
sudo sh install.sh
Step no: 1 of 7 | Welcome -------------------------------------------------------------------------------- Welcome to the Intel(R) Math Kernel Library 11.0 Update 4 for Linux* installation program. The Flagship of HPC Math Libraries. This library contains highly optimized, extensively threaded, mathematical functions for engineering, scientific, and financial applications that require maximum performance. -------------------------------------------------------------------------------- You will complete the steps below during this installation: Step 1 : Welcome Step 2 : License Step 3 : Activation Step 4 : Intel(R) Software Improvement Program Step 5 : Options Step 6 : Installation Step 7 : Complete -------------------------------------------------------------------------------- Step no: 1 of 7 | Options > Missing Optional Pre-requisite(s) -------------------------------------------------------------------------------- There are one or more optional unresolved issues. It is highly recommended to resolve them all before you continue the installation. You can fix them without exiting from the installation and re-check. Or you can quit from the installation, fix them and run the installation again. -------------------------------------------------------------------------------- Missing optional pre-requisites -- unsupported OS -------------------------------------------------------------------------------- 1. Skip missing optional pre-requisites [default] 2. Show the detailed info about issue(s) 3. Re-check the pre-requisites h. Help b. Back to the previous menu q. Quit -------------------------------------------------------------------------------- Please type a selection or press "Enter" to accept default choice [1]: 1 [..] 13. THIRD PARTY PROGRAMS. The Materials may include third party programs or materials that are governed by the third party's license terms, including without limitation, open source software. The license terms associated with such third party programs or materials govern your use of same, and Intel is not liable for such third party programs or materials. * Other names and brands may be claimed as the property of others -------------------------------------------------------------------------------- Do you agree to be bound by the terms and conditions of this license agreement? Type "accept" to continue or "decline" to back to the previous menu: accept Step no: 3 of 7 | Activation -------------------------------------------------------------------------------- If you have purchased this product and have the serial number and a connection to the internet you can choose to activate the product at this time. Activation is a secure and anonymous one-time process that verifies your software licensing rights to use the product. Alternatively, you can choose to evaluate the product or defer activation by choosing the evaluate option. Evaluation software will time out in about one month. Also you can use license file, license manager, or remote activation if the system you are installing on does not have internet access activation options. -------------------------------------------------------------------------------- 1. I want to activate my product using a serial number [default] 2. I want to evaluate my product or activate later 3. I want to activate either remotely, or by using a license file, or by using a license manager h. Help b. Back to the previous menu q. Quit -------------------------------------------------------------------------------- Please type a selection or press "Enter" to accept default choice [1]: 1 Note: Press "Enter" key to back to the previous menu. Please type your serial number (the format is XXXX-XXXXXXXX): -------------------------------------------------------------------------------- Activation completed successfully. -------------------------------------------------------------------------------- Press "Enter" key to continue: Step no: 4 of 7 | Intel(R) Software Improvement Program -------------------------------------------------------------------------------- Help improve your experience with Intel(R) software Participate in the design of future Intel software. Select 'Yes' to give us permission to learn about how you use your Intel software and we will do the rest. - No Personal contact information is collected - There are no surveys or additional follow-up emails by opting in - You can stop participating at any time Learn more about Intel(R) Software Improvement Program http://software.intel.com/en-us/articles/software-improvement-program With your permission, Intel may automatically receive anonymous information about how you use your current and future Intel software. -------------------------------------------------------------------------------- 1. Yes, I am willing to participate and improve Intel software. (Recommended) 2. No, I don't want to participate in the Intel(R) Software Improvement Program at this time. b. Back to the previous menu q. Quit -------------------------------------------------------------------------------- Please type a selection: 2 Step no: 5 of 7 | Options -------------------------------------------------------------------------------- You are now ready to begin installation. You can use all default installation settings by simply choosing the "Start installation Now" option or you can customize these settings by selecting any of the change options given below first. You can view a summary of the settings by selecting "Show pre-install summary". -------------------------------------------------------------------------------- 1. Start installation Now 2. Change install directory [ /opt/intel/composer_xe_2013.4.183 ] 3. Change components to install [ All ] 4. Show pre-install summary h. Help b. Back to the previous menu q. Quit -------------------------------------------------------------------------------- Please type a selection or press "Enter" to accept default choice [1]: 1 Step no: 6 of 7 | Installation -------------------------------------------------------------------------------- Each component will be installed individually. If you cancel the installation, components that have been completely installed will remain on your system. This installation may take several minutes, depending on your system and the options you selected. -------------------------------------------------------------------------------- Installing Intel Math Kernel Library 11.0 Update 4 on IA-32 component... done -------------------------------------------------------------------------------- Installing Intel Math Kernel Library 11.0 Update 4 on Intel(R) 64 component... done -------------------------------------------------------------------------------- Finalizing installation... done -------------------------------------------------------------------------------- Press "Enter" key to continue Step no: 7 of 7 | Complete -------------------------------------------------------------------------------- Thank you for installing and for using the Intel(R) Math Kernel Library 11.0 Update 4 for Linux*. Support services start from the time you install or activate your product, so please create your support account now in order to take full advantage of your product purchase. Your Subscription Service support account provides access to free product updates interactive issue management, technical support, sample code, and documentation. To create your support account, please visit the Subscription Services web site https://registrationcenter.intel.com/RegCenter/registerexpress.aspx?clientsn=NBJ N-87S2RM3P To get started using Intel(R) Math Kernel Library 11.0 Update 4 located in /opt/intel/composer_xe_2013.4.183 visit: install-dir/Documentation/en_US/mkl/get_started.html. -------------------------------------------------------------------------------- q. Quit [default] -------------------------------------------------------------------------------- Please type a selection or press "Enter" to accept default choice [q]:

3. Usage
I'll demonstrate using nwchem. I've repeated the build instructions over and over on this blog, but here it goes again:

sudo apt-get install build-essential gfortran python2.7-dev libopenmpi-dev openmpi-bin
sudo mkdir /opt/nwchem -p
sudo chown $USER:$USER /opt/nwchem
cd /opt/nwchem
wget http://www.nwchem-sw.org/download.php?f=Nwchem-6.3.revision1-src.2013-05-28.tar.gz -O Nwchem-6.3.revision1-src.2013-05-28.tar.gz
tar xvf Nwchem-6.3.revision1-src.2013-05-28.tar.gz
cd nwchem-6.3-src.2013-05-28/
export NWCHEM_TOP=`pwd`
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=`pwd`
export NWCHEM_MODULES="all python"
export PYTHONHOME=/usr

export BLASOPT="-L/opt/intel/composer_xe_2013.4.183/mkl/lib/intel64/ -lmkl_sequential -lmkl_core -lmkl_intel_ilp64"
export LIBRARY_PATH="$LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/intel/composer_xe_2013.4.183/mkl/lib/intel64/"

export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"

cd $NWCHEM_TOP/src

make clean
make nwchem_config
make FC=gfortran 1> make.log 2>make.err

cd $NWCHEM_TOP/contrib
export FC=gfortran

I had a look at http://www.ichec.ie/support/tutorials/mkl.pdf to better understand what the different libraries do. Still not quite clear though, so I might not be getting optimal performance.

4. Performance
I basically repeated the test job shown here: http://verahill.blogspot.com.au/2013/05/430-briefly-crude-comparison-of.html

The machine has 16 gb ram and an intel i5-2400 cpu (4 cores)

ACML: Total times cpu: 1644.3s wall: 1655.5s
MKL:    Total times  cpu:     1550.0s     wall:     1563.8s

It certainly looks good. While 100 seconds in less than two minutes, and 1550 is 94 %  of 1644, it's a difference of about one day per fortnight, which doesn't make much of a difference on a small single-user cluster, it would make a bit of cumulative difference on our 1400 core multi-user cluster.

26 June 2013

464. bytes2words -- python script

This script does something I could easily do myself in e.g. bc, and so is a bit of a waste of time. However, because I enjoy writing short python scripts, I did it anyway.

/bytes2words -i 8gib -o MW
Input: 8gib Output: mw Assuming 64 bit word size Result: 8 gib is 1073 mw
# Converts bytes to words
# The impetus comes from the use of MW as the default memory unit in many computational pieces of software.

import sys

def split_text(s): # from http://stackoverflow.com/questions/12409894
    from itertools import groupby
    for k,g in groupby(s, str.isalpha):
        yield ''.join(list(g))

def getargs(arguments):

  if "-i" in arguments:
   print 'Input: %s'% switches['i']

   if len(inputted)>1:
    if switches['unit'] == 'kib':
    elif switches['unit'] == 'kb':
    elif switches['unit'] == 'mb':
    elif switches['unit'] == 'mib':
    elif switches['unit'] == 'gb':
    elif switches['unit'] == 'gib':
   if len(inputted)>2:
    print 'Illegal input: %s'% inputted
  if "-o" in arguments:
   print 'Output: %s'% switches['o']
   if switches['o'] == 'w':
   elif switches['o'] == 'kiw':
   elif switches['o'] == 'kw':
   elif switches['o'] == 'mw':
   elif switches['o'] == 'miw':
   elif switches['o'] == 'gw':
   elif switches['o'] == 'giw':
    print 'illegal output argument' % switches['o']

  if "-b" in arguments:
   print 'Word size: %i bits'% switches['b']
   print 'Assuming 64 bit word size'

  if ("-h" in arguments) or ("--help" in arguments):
   print '\t\t bytes2words version %s' % version
   print ' \t-i\t input in words with units, e.g. 200kb'
   print ' \t-o\t output unit (w,kw,mw,gw)'
   print ' \t-b\t word size (32 or 64 (bit))'
   print 'Exiting'
  a=0 # do nothing
 if doexit==1:
 return switches
if __name__=="__main__":
 print 'Result: %s %s  is %s %s' % ( (switches['i']),switches['unit'],int(switches['if']*(float(switches['i'])/float(switches['b']/bitsperbyte))/float(switches['of'])),switches['o'])

463. Very Briefly: Installing ia32-libs

Since the skype and bankid posts on this blog rely on ia32-libs, and the question has popped up a few times:

To install ia32-libs on debian wheezy and up you need to enable multiarch:

sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get install ia32-libs

24 June 2013

462. Olex2 (1.1) on Debian

Olex2 is an open-source program for solving and refining crystal structures. I am not a crystallographer -- I have never (successfully= publication quality output) solved a crystal structure in my life -- and I have no idea whether Olex2 is an alternative or a complement to SHELX.

What I do now is that Olex2 is more of a point-and-click type program with a snazzy GUI, and so it would seem that the learning curve may be shallower than that of SHELX. However, as always you must always be cautious when doing science -- just because it's easy to do, doesn't mean that you're doing it right (I'm looking at you, biologists/physicians/social scientists using statistical software).

But maybe Olex2 makes getting started just easy enough that you can slowly work your way towards actually slowly get to the point where you know and understand what you are doing?

That's my hope anyway.

Quick note: there's no source code on the http://www.olex2.org/ website. However, there's a sourceforge SVN repository: http://sourceforge.net/projects/olex2/

Ideally I should show you how to download and compile and make a .deb package from the source, but sadly I haven't managed to figure it out (I'll add troubleshooting data to this post later). I even tried using the deb rules by http://jupiter.plymouth.edu/~jsduncan/software/olex2.php (again, I might add troubleshooting data later)

Anyway, using the pre-built binaries is easy enough. Only the older version 1.1 seems to be working ok on Debian Wheezy.

Note that I've generated a list of dependencies the lazy way -- by looking at the ldd output.


You'll need a raft of libraries, including libGL. You can get this from different sources depending on your graphics card (or lack thereof). Install libgl1-mesa-glx or libgl1-fglrx-glx (ati) or libgl1-nvidia-glx (nvidia). Then continue.

Again, note that you may want to change the versions of the packages below that you install. Anyway, here's a list that may work:

sudo apt-get install libpng3 libglu1-mesa libgtk2.0-0 libatk1.0-0 libgdk-pixbuf2.0-0 libpango1.0-0 libglib2.0-0 libfreetype6 libxrender1 libfontconfig1 libx11-6 libxext6 libpng12-0 libxinerama1 libxxf86vm1 libsm6 lib32z1 zlib1g libc6 libpython2.6 libstdc++6 libgcc1 libxcomposite1 libxdamage1 libxfixes3 libcairo2 libxi6 libxrandr2 libxcursor1 libffi5 libexpat1 libpcre3 libxcb1 libice6 libuuid1 libssl1.0.0 libpixman-1-0 libxcb-shm0 libxcb-render0 libselinux1 libxau6 libxdmcp6

Get the compiled binary:
cd ~
wget http://www.olex2.org/olex2-distro/1.1/olex2-linux64.zip
unzip olex2-linux64.zip
cd ~/olex2
sed -i 's./work/distro/olex2-new.$HOME/olex2.g' start

Create the file ~/.local/share/applications/olex2.desktop
[Desktop Entry] Name=Olex2 GenericName=Olex2 Comment=Software for refinement of crystal structures Exec=sh /home/verahill/olex2/start Terminal=false Type=Application Categories=Science Version=1.1
You're now ready to use Olex2. If you launch it by hand, use the start script. Otherwise just launch is from your desktop.

Olex2 is very pretty, but  feels a bit incomplete and buggy though (again, I don't know if these are real issues or not -- and I'm no expert). The issue is that the program seems to throw errors every now and again. For example when clicking on electron density map in the Solve window:
Map sigma 0.000 : CalcFourier {diff=}{r=0.1}{m=} N4esdl25TInvalidArgumentExceptionE mask size at [xlib/fracmask.cpp(Init):22] calcFourier -diff -r=0.1 -m

Problem with version 1.2:
Start Olex2. 1.2 takes a lot longer than 1.1 to start, but that's fine. Click on Tutorials. Click on Maps and Masks. CLick Next five times (the step where you actually create the mask). It triggers the following error:
An error occured running the function/macro next_demo_item Traceback (most recent call last): File "olexFunctions.py", line 480, in func retVal = f(*args, **kwds) File "/home/verahill/olex2_1.2/etc/scripts/Tutorials.py", line 138, in next_demo_item self.run_demo_item() File "/home/verahill/olex2_1.2/etc/scripts/Tutorials.py", line 318, in run_demo_item flash_gui_control(control) File "gui/tools/__init__.py", line 123, in flash_gui_control OV.Refresh() File "guiFunctions.py", line 61, in Refresh olx.Refresh() File "/home/verahill/.olex2/data/4f11cd71424c9e8484c8c4f91644e3b6/olx/__init__.py", line 1319, in Refresh for arg in args: RuntimeError: [repository/pyext.cpp(runOlexFunctionEx):354]: Function 'html.SetImage' failed: wrong html object name: 'IMG_H3-H3-MASKS'Key variable values: args = () kwds = {} al = []

Problems with compiling:
sudo apt-get install libwxgtk2.8-dev
mkdir ~/tmp/svn_co
cd ~/tmp/svn_co
svn checkout svn://svn.code.sf.net/p/olex2/code/branches/1.2 olex2-1.2
cd olex2-1.2/
/home/verahill/tmp/svn_co/olex2-1.2/gxlib/gxapp.cpp: In member function ‘esdl::TUndoData* gxlib::TGXApp::Name(gxlib::TXAtom&, const olxstr&, bool)’: /home/verahill/tmp/svn_co/olex2-1.2/gxlib/gxapp.cpp:2003:73: error: no matching function for call to ‘gxlib::TGXApp::SynchroniseBonds(gxlib::TXAtomPList)’ /home/verahill/tmp/svn_co/olex2-1.2/gxlib/gxapp.cpp:2003:73: note: candidate is: [..] /home/verahill/tmp/svn_co/olex2-1.2/gxlib/gxapp.cpp: In member function ‘esdl::TUndoData* gxlib::TGXApp::Name(const olxstr&, const olxstr&, bool, bool)’: /home/verahill/tmp/svn_co/olex2-1.2/gxlib/gxapp.cpp:2136:71: error: no matching function for call to ‘gxlib::TGXApp::SynchroniseBonds(gxlib::TXAtomPList)’ /home/verahill/tmp/svn_co/olex2-1.2/gxlib/gxapp.cpp:2136:71: note: candidate is: [..[ make[1]: *** [/home/verahill/tmp/svn_co/olex2-1.2/obj/gxapp.s] Error 1 make[1]: *** Waiting for unfinished jobs.... make[1]: *** [/home/verahill/tmp/svn_co/olex2-1.2/obj/gxmacro.s] Error 1 make[1]: Leaving directory `/home/verahill/tmp/svn_co/olex2-1.2' make: *** [all] Error 2

There's a CMakeLists.txt file, but when doing e.g.
mkdir build_olex
cd build_olex/
cmake ../olex2-1.2

I get
CMake Error: Error in cmake code at /home/verahill/tmp/svn_co/olex2-1.2/CMakeLists.txt:49: Parse error. Expected a command name, got unquoted argument with text "${olex2_html_SRCS}". -- Configuring incomplete, errors occurred!

461. Briefly: setting up SHELX on linux (crystallography)

Nothing difficult, but putting up instructions won't hurt anyone.

SHELX is THE crystal structure refinement software. I'm not a crystallographer, but it never hurts familiarising yourself with the tools of  your collaborators.

Register using this page (if you're an academic user): http://shelx.uni-ac.gwdg.de/SHELX/register.php
To find the answer to the xtal question, use google.

You'll then receive an email with a password. Now go to
where you'll find instructions.

Download all the files:

Several of my files became corrupted the first time I downloaded them for some reason: anode.bz2, shelx[cde].bz2, shredcif.bz2.

I'm presuming that you're downloading the files to ~/Downloads

Here are the 'good' md5sums:
219183542ada47a17e5528bf217f9261 anode.bz2 61335e6b9cf2e654242db80822f32681 ciftab.bz2 918fe0a04e59589938a81a93d8e3eaff shelxc.bz2 e65580af087989aa4958eb53dcd8a473 shelxd.bz2 bc5cad6e4129fa61bbde49207cd4d244 shelxe.bz2 5390146a4b516425fb7b326533443ba7 shelxl.bz2 95617863be917743df55bd94509504fb shelxs.bz2

While you're at it, download the testdata from http://shelx.uni-ac.gwdg.de/~gsheldr/bin/test_data/: cdetutorial_andrea.zip, ciftab_templates.zip, difficult_sad.zip, ltests.zip, pn1a.zip

sudo apt-get install bunzip2 xargs
mkdir ~/tmp/shelx-2013 -p
cd ~/Downloads
cp shredcif.bz2 shelxe.bz2 shelxd.bz2 shelxc.bz2 ciftab.bz2 anode.bz2 shelxl.bz2 shelxs.bz2 ~/tmp/shelx-2013
cd ~/tmp/shelx-2013/
ls *.bz2|xargs -I {} bunzip2 {}
chmod +x *
sudo cp * /usr/local/bin

If you downloaded the test data:

mkdir ~/tmp/shelx_examples
cd ~/Downloads
cp cdetutorial_andrea.zip ciftab_templates.zip difficult_sad.zip ltests.zip pn1a.zip ~/tmp/shelx_examples
cd ~/shelx_examples
ls *.zip |xargs -I {} unzip {}

And you're done. Now, learning how to use SHELX, and how to use it properly, is a different matter on which I am not qualified to write.

21 June 2013

460. Briefly: Crystallography software: CCSD Mercury

The Cambridge Crystallographic Data Centre (CCDC)/Crystal Structure Database (CSD) has a free structure viewer called Mercury. Downloading and installing it is pretty straightforward, but still makes for a reasonable post.

To install
Go to http://www.ccdc.cam.ac.uk/SupportandResources/Downloads/pages/ProtectedDownloadProductList.aspx and click on Mercury. It'll take you to a license agreement page, Click accept to continue. Note that it won't work if you are blocking cookies.

Download Mecury 3.1 for Linux, and the 3.1.1 patch for linux. I'll presume that you downloaded the files to ~/Downloads.

cd ~/Downloads
chmod +x mercurystandalone-3.1-linux-installer.run

Finally, create a file called
[Desktop Entry] Name=Mercury GenericName=CCDC Mercury Comment=Visualization of crystal structures Exec=/home/verahill/.Mercury_3.1/bin/mercury Icon=/home/verahill/.Mercury_3.1/icons/mercury_48x48.png Terminal=false Type=Application Categories=Science Version=3.1
Update to 3.1.1
cd ~/Downloads
chmod +x csdsystempatch-5.34.2-linux-installer.run

And you are done! Note that you will want to have working OpenGL for this to look ok.

20 June 2013

459. Briefly: Proxies, browsing and paranoia

It's easy to configure Chrome to use Tor to preserve a semblance of privacy online (http://verahill.blogspot.com.au/2013/06/450-tor-and-chrome-on-debian.html). There are a few, simple things you can do to make your life with a proxy easier to manage.

This post presumes that you've followed this post first: http://verahill.blogspot.com.au/2013/06/450-tor-and-chrome-on-debian.html. In particular, that you have turned off pre-fetching.

In addition, you may want to think about the following:

Incognito mode
On the lower end of the scale, you may or may not want to use incognito mode consistently. This has little bearing on privacy online, but it depends on whether you want to leave traces on your computer of your browsing history. Although that should only be an issue if someone gets physical access to your computer, you never know if the next browser bug will give someone complete access to your history. Most likely it'll only provide metadata (which is what the NSA brouhaha has been mostly about).

Anyway, if you feel this is an important issue then you should probably be encrypting your disks with encfs as well.

Search engine
It's probably more important to rethink how you are using search engines in Chrome. First of all, you should turn off instant search. Secondly, you will want to consider whether you want to use google as the default search engine for queries in the URL field. Two main search engines come to mind: duckduckgo.com, and startpage.com. While duckduckgo.com has a higher profile, startpage.com is a bit more full-featured, and that's because it takes your query, anonymizes it, and passes it on to google. It's also based in Europe, which I (probably naively) feel is safer.

Go to startpage.com, and click on 'add to chrome' under the search box. Then set Startpage HTTPS as the default in Chrome:

Also consider making sure that google.com isn't your home page in chrome.

Even though Tor works fine in general, it can be a bit slow, and you don't want to use it for everything anyway. There are times when you don't want to use a proxy. In my case, that's when I visit journal websites or my university websites. Also, I have set up a reverse proxy via my home router, and it's faster than Tor, so for a lot of things I'm fine with using that.

Switch ProxySharp supports the creation of rule-based proxy switching. In my case, I've set it so that if I use google, I use Tor. If I go to RSC, ACS, Wiley or Elsevier journals, I use my university connection, and for everything else, I use my home router.

You then just need to click your way through to the proxyswitcher alternative:
The icon will change colour depending on which proxy is active. Pretty neat!

19 June 2013

458. Briefly: Converting GRAMS ASP ascii data to two-column ascii data

We have a couple of CARY 630 FT-IR /ATR instruments.

I hate them. Apart from being the Mac equivalent of spectrometers (if you try to do anything remotely creative you'll have a bad day. Point and click works well, most of the time), they aren't able to output data in any reasonable format.

At least not the way I'd define 'reasonable' i.e. simple x-y ascii data file and/or JCAMP-DX and/or even .csv. The default output is a binary .a2r file.

The only ascii-type format is a proprietary GRAMS ASP ascii file, for which I haven't been able to get the formal specs. Using google it seems as if the German arm of agilent did publish it, but when clicking on the links I'm told the file no longer exists, and google cache isn't playing ball.

Anyway. Luckily the format seems pretty simple.

Here are the first ten lines of an .asp file;
1798 4000.41016197344 650.579285428114 1 128 4 98.4862110783457 98.4183476284596 98.4587565715995 98.5660576694946
* The first line is the number of acquired data points
* The second line is the highest reciprocal wavelength in cm-1.
* The third line is the lowest reciprocal wavelength in cm_1.
* I don't know what the fourth and fifth lines signify. It could be dynamic resolution in the Y axis.
* The sixth line is the native resolution, i.e. 4 cm-1/data point. However, the data seems to be zero-filled, i.e. it seems the resolution is really ca 1.86 cm-1/pt.
Knowing the above, we can write a simple python script, which we'll call asp2asc, which will allow us to generate files suitable for gnuplot.
Example usage:
./asp2asc -i data.asp -o data.dat

#converts GRAMS ascii (asp) output from an CARY 630 FT-ATR-IR to a two-column ascii dat file
import sys

def getvars(arguments):
  if "-o" in arguments:
   print 'Output: %s.'%theoutput
  elif "--output" in arguments:
   print 'Output: %s.'%theoutput
   print ''
   print 'Error -- no output file defined.'
   print ''

  if "-i" in arguments:
   print 'Input: %s.'%theinput
  elif "--input" in arguments:
   print 'Input: %s.'%theinput
   print ''
   print 'Error -- no input file defined.'
   print ''

  if ("-h" in arguments) or ("--help" in arguments):
   print " "
   print "\t\tThis is asp2asc, a tool for generating converting"
   print "\t\tGRAMS ASP ascii files to two-column ascii files"
   print "\t\tThis is version",ver
   print "\tUsage:"
   print "\t-h\t--help   \tYou're looking at it."
   print "\t-i\t--input \tInput file, e.g. data.asp"
   print "\t-o\t--output \tOutput file, e.g. data.dat"
   print ""
  a=1   #do nothing
 if exit==1:
 print ''

 return switches

def getparams(datafile):
 for line in datafile:
  if n==6:
 return params
def getydata(datafile):
 for line in datafile:
 return ydata
def makexdata(xpts,xmax,increment):
 while n < xpts:
 return xdata

def writexydata(outfile,xdata,ydata):
 for n in range(0,len(xdata)):
 return 0

if __name__ == "__main__":

 ydata=getydata(infile) # needs getparams to have parked file reading at the 7th line 



 if len(xdata)==len(ydata):
  print 'Something bad happened:'
  print 'Number of X data points not equal to number of Y data points'
  print 'x pts: %i, y pts: %i'%(len(xdata),len(ydata))

Of course you could do this easily in a spreadsheet too, but I honestly find myself avoiding spreadsheet programmes like the plague ever since I learned how to use sed, gawk, and python.
Also, WHY do they make it so unnecessarily difficult to export your own data?

457. Very Briefly: Microsoft has a Tor exit node?

Whenever I play around with Tor I use ipchicken.com or whatsmyip.org to make sure that I'm indeed using a proxy. I also normally do a whois on the IP address, so see who's running the exit node.

Today I ended up with the IP address

NetRange: - CIDR:, OriginAS: NetName: MSFT-EP NetHandle: NET-168-61-0-0-1 Parent: NET-168-0-0-0-0 NetType: Direct Assignment RegDate: 2011-06-22 Updated: 2012-10-16 Ref: http://whois.arin.net/rest/net/NET-168-61-0-0-1 OrgName: Microsoft Corp OrgId: MSFT-Z Address: One Microsoft Way City: Redmond StateProv: WA PostalCode: 98052 Country: US RegDate: 2011-06-22 Updated: 2013-04-12 Ref: http://whois.arin.net/rest/org/MSFT-Z OrgTechHandle: MSFTP-ARIN OrgTechName: MSFT-POC OrgTechPhone: +1-425-882-8080 OrgTechEmail: iprrms@microsoft.com OrgTechRef: http://whois.arin.net/rest/poc/MSFTP-ARIN OrgAbuseHandle: HOTMA-ARIN OrgAbuseName: Hotmail Abuse OrgAbusePhone: +1-425-882-8080 OrgAbuseEmail: abuse@hotmail.com OrgAbuseRef: http://whois.arin.net/rest/poc/HOTMA-ARIN OrgAbuseHandle: MSNAB-ARIN OrgAbuseName: MSN ABUSE OrgAbusePhone: +1-425-882-8080 OrgAbuseEmail: abuse@msn.com OrgAbuseRef: http://whois.arin.net/rest/poc/MSNAB-ARIN OrgNOCHandle: ZM23-ARIN OrgNOCName: Microsoft Corporation OrgNOCPhone: +1-425-882-8080 OrgNOCEmail: noc@microsoft.com OrgNOCRef: http://whois.arin.net/rest/poc/ZM23-ARIN OrgAbuseHandle: ABUSE231-ARIN OrgAbuseName: Abuse OrgAbusePhone: +1-425-882-8080 OrgAbuseEmail: abuse@microsoft.com OrgAbuseRef: http://whois.arin.net/rest/poc/ABUSE231-ARIN
That Microsoft is listed as the organisation doesn't necessarily mean that they are running the node (could be a hosting company) but it still seems that this might actually be MS running this one. Maybe it's just for research purposes, but it still seemed a bit surprising.

Microsoft as a company isn't exactly known for doing things out of the goodness of their hearts. Oh well.

17 June 2013

456. Adding NWChem basis sets to ECCE. Part 2. A solution: nwchem2ecce.py


I've moved the finished scripts to here:

They work! I've also added a number of converted basis sets to the sourceforge repo under 'examples'. You'll also find example ecp and ECPOrbital files.


Here's the README:
The programmes are not 'intelligent' -- they won't check that you are doing something reasonable. Bad input = bad output. __Installation__: Download eccepag and nwbas2ecce They are both python (2.7) programmes, so you will need to install python to run them. On linux, this is normally very easy. E.g. on debian, run 'sudo apt-get install python2.7' and you are done. If you want, you can put the files in /usr/local/bin and do 'sudo chmod +x /usr/local/bin/eccepage' 'sudo chmod +x /usr/local/bin/nwbas2ecce' and you will be able to call the scripts from any directory. __Usage__ nwbas2ecce can turn a full basis set, or a, ECP basis set, into an ECCE compatible set of basis set files. Typically, an nwchem basis set consists of a single file, e.g. 3-21g. It can also be divided into several files, e.g. def2-svp and def-ecp, where the effective core potentials (ecps) are in def2-ecp. Other basis set files, like lanl2dz_ecp, contains both the orbital and the contraction parts. Typically, a ECCE basis set suite consists of: basis.BAS basis.BAS.meta basis.POT (for ECP) basis.POT.meta (for ECP) Sometimes polarization and diffuse functions are separated from the main .BAS file. E.g. 3-21++G* consists of 3-21G.BAS 3-21GS.BAS POPLDIFF.BAS , in addition to the meta files. The meta files are just markup-language type files with e.g. references. Note that you don't HAVE to break up the basis set components like this. Since the basis set data can be broken up into smaller files, the overall basis set is defined as an entry in a category file. For example, 3-21G is defined in the category file 'pople', and points to 3-21G.BAS. 3-21G* is also defined in pople, but point to both 3-21G.BAS and 3-21GS.BAS. ECP works in a similar way, by combining a .BAS and a .POT file. Note that the .POT files look different from the .BAS files. nwbas2ecce generates .BAS and .POT files based on whether there are basis/end or ecp/end sections in the nwchem basis set file. If there are both, both POT and BAS files are generated. All these files are contained in server/data/Ecce/system/GaussianBasisSetLibrary Finally, you need to generate .pag and .dir files that go into the server/data/Ecce/system/GaussianBasisSetLibrary/.DAV directory. The .dir file is always empty, while the .pag file is unfortunately a binary file. eccepag can, however, generate it with the right input. See e.g. http://verahill.blogspot.com.au/2013/06/455-adding-nwchem-basis-sets-to-ecce.html for more detailed information __Example__ We'll use def2-svp as an example. The nwchem basis set file def2-svp contains the basis set, while def2-ecp contains the core potentials. Use def2-svp to generate DEF2_SVP.BAS, DEF2_SVP.BAS.meta. Use def2-ecp to generate DEF2_ECP.POT, DEF2_ECP.POT.meta. As part of the generation, .descriptor files are also generated. These contain information that should go into the category file(s). Then generate the .pag files for both the POT and the BAS files, and touch the .dir files into existence. Do like this: nwbas2ecce -i def2-svp -o DEF2_SVP.BAS -n 'def2-svp' nwbas2ecce -i def2-ecp -p DEF2_ECP.POT -n 'def2-ecp' eccepag -n def2-svp -t ECPOrbital -c ORBITAL -y Segmented -s Y -o DEF2_SVP.BAS.pag eccepag -n def2-ecp -t ecp -c AUXILIARY -o DEF2_ECP.POT.pag NOTE: I don't actually know if def2-svp is segmented, and spherical. I don't think it matters for the .pag file generation. Also note that most inputs are case sensitive. Look at a similar .pag file for hints. You now have the following files: DEF2_ECP.POT DEF2_ECP.POT.descriptor DEF2_ECP.POT.meta DEF2_ECP.POT.pag DEF2_SVP.BAS DEF2_SVP.BAS.descriptor DEF2_SVP.BAS.meta DEF2_SVP.BAS.pag Copy the files. Note that you need to select the correct target directory, and that will vary with where you installed ECCE. I'll assume it's in /opt/ecce cp DEF2* /opt/ecce/server/data/Ecce/system/GaussianBasisSetLibrary cd /opt/ecce/server/data/Ecce/system/GaussianBasisSetLibrary mv *.pag .DAV/ touch .DAV/DEF2_SVP.BAS.dir .DAV/DEF2_ECP.POT.dir cat DEF2_SVP.BAS.descriptor >> ECPOrbital cat DEF2_ECP.POT.descriptor >> ECPOrbital cat DEF2_ECP.POT.descriptor >> ecp Edit ECPOrbital so that it reads: name= def2-svp files= DEF2_SVP.BAS DEF2_ECP.POT atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn atoms= Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn

455. Adding NWChem basis sets to ECCE. Part 1. The formats

I've written a python script that cam
1. do automatic conversion of nwchem basis set files to .BAS and .POT
2. generate entries that can be added to the category file

What it currently can't do is generate a .pag file.

The python script is not in this post. I'll release it soon though.

The structure:
ECCE stores basis sets in server/data/Ecce/system/GaussianBasisSetLibrary/.

The number of files associated with a basis set varies, and the way a basis set is set up seems to vary as well depending on who added it.

Each basis set needs at least the following files:

In addition, the basis set needs to be added to the correct category by being added to one of the following files:

Charge correlation_consistent DFTOrbital diffuse ecp ECPOrbital Exchange other_generally_contracted other_segmented polarization pople rydberg
e.g. 6-31G goes to pople, while LANL2DZ/ECP goes to ECPOrbital.

Looking at the basis set tool in ECCE you have the following categories/subcategories:
Orbital: Pople Shared, Other Segmented, Corr. Consistent, Other Gen. Contr., ECP Orbital, DFT Orbital. Auxiliary: Polarization, Diffuse, Rydberg. ECP: DFT: Charge Fitting, Exchange Fitting.
What it means is that you can 'mix and match' by adding your .BAS or .POT files to different category files (e.g. you can have LANL2DZ dp both ECPOrbital, ecp and polarization, all at the same time. See below for how basis sets can be broken up.

Example: The simple cases: 3-21G, 3-21G*, 3-21++G*
For a basis set like 3-21G there are two files: 3-21G.BAS and 3-21G.BAS.meta.
In addition grep shows that there's an entry in the file pople for 3-21G.

The .BAS file:
The entry for C in 3-21G.BAS looks like this:
atom=C contraction shell=S num_primitives=3 num_coefficients=1 172.2560 0.0617669 25.91090 0.358794 5.533350 0.700713 contraction shell=SP num_primitives=2 num_coefficients=2 3.664980 -0.395897 0.236460 0.770545 1.215840 0.860619 contraction shell=SP num_primitives=1 num_coefficients=2 0.195857 1.000000 1.000000
Nothing too strange. For example, the nwchem format for C in 3-21g is:
basis "C_3-21G" CARTESIAN C S 172.2560000 0.0617669 25.9109000 0.3587940 5.5333500 0.7007130 C SP 3.6649800 -0.3958970 0.2364600 0.7705450 1.2158400 0.8606190 C SP 0.1958570 1.0000000 1.0000000 end
Writing a python script that translates between the two is simple.

The .BAS.meta file:
The 3-21G.BAS.meta file looks like this:
references Elements References -------- ---------- H - Ne: J.S. Binkley, J.A. Pople, W.J. Hehre, J. Am. Chem. Soc 102 939 (1980) Na - Ar: M.S. Gordon, J.S. Binkley, J.A. Pople, W.J. Pietro and W.J. Hehre, J. Am. Chem. Soc. 104, 2797 (1983). K - Ca: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 7, 359 (1986). Ga - Kr: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 7, 359 (1986). Sc - Zn: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, 861 (1987). Y - Cd: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, 880 (1987). Cs : A 3-21G quality set derived from the Huzinage MIDI basis sets. E.D. Glendening and D. Feller, J. Phys. Chem. 99, 3060 (1995) references info 3-21G Split Valence Basis ------------------------- Elements Contraction References H - He: (3s) -> [2s] J.S. Binkley, J.A. Pople and W.J. Hehre, Li - Ne: (6s,3p) -> [3s,2p] J. Am. Chem. Soc. 102, 939 (1980). Na - Ar: (9s,6p) -> [4s,3p] M.S. Gordon, J.S. Binkley, J.A. Pople, W.J. Pietro and W.J. Hehre, J. Am. Chem. Soc. 104, 2797 (1983) K - Ca: (12s,9p) -> [5s,4p] K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 7, Ga - Kr: (12s,9p,3d) -> [5s,4p,1d] 359 (1986). Sc - Zn: (12s,9p,3d) -> [5s,4p,2d] K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, 861 (1987). Rb - Sr: (15s,12p,3d)-> [6s,5p,1d] Y - Cd: (15s,12p,6d)-> [6s,5p,3d] K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, In - I: (15s,12p,6d)-> [6s,5p,2d] 880 (1987). Cs : (18s,12p,6d)-> [6s,5p,2d] A 3-21G quality set derived from the Huzinage MIDI basis sets. E.D. Glendening and D. Feller, J. Phys. Chem. 99, 3060 (1995). The 3-21G basis set contains the same number of Gaussian primitives as the STO-3G basis, but the valence electrons are described with two functions per AO instead of one. In most cases the 3-21G basis set gives results which are as good as the more expensive 4-31G and 6-31G sets. 3-21G Atomic Energies ROHF State UHF (noneq) ROHF (noneq) ROHF(equiv) HF Limit (equiv) ----- ---------- ----------- ----------- --------- H 2-S -0.496199 -0.496199 -0.496199 -0.50000 He 1-S -2.835680 -2.835680 -2.835680 -2.86168 Li 2-S -7.381513 -7.381513 -7.381513 -7.43273 Be 1-S -14.486820 -14.486820 -14.486820 -14.57302 B 2-P -24.389762 -24.389634 -24.148989 -24.52906 C 3-P -37.481070 -37.480389 -37.480389 -37.68862 N 4-S -54.105390 -54.103658 -54.103658 -54.40094 O 3-P -74.393657 -74.392512 -74.391782 -74.80940 F 2-P -98.845009 -98.844645 -98.844230 -99.40935 Ne 1-S -127.132546 -127.803824 -127.803824 -128.54710 Na 2-S -160.854064 -160.854041 -160.854041 -161.85891 Mg 1-S -198.468103 -198.468103 -198.468103 -199.61463 Al 2-P -240.551046 -240.551024 -240.551010 -241.87671 Si 3-P -287.344431 -287.344419 -287.344393 -288.85436 P 4-S -339.000079 -339.000027 -339.000027 -340.71878 S 3-P -395.551336 -395.551083 -395.550591 -397.50490 Cl 2-P -457.276552 -457.276414 -457.276096 -459.48207 Ar 1-S -524.342962 -524.342962 -524.342962 -526.81751 K 2-S -596.152980 -596.152923 -596.152923 -599.16479 info comments 2/16/95 - DFF - Modify the format of the literature citation. 12/07/93 - SJB - Add Nb to Xe. 8/4/93 - DFF - Add Y and Zr. 12/2/92 - DFF - Add Rb and Sr. 7/13/90 - DFF - Original creation of this file from MIA basis set library. comments
Again, most of this can be extracted using a shell/python/perl script from the corresponding 3-21g nwchem basis set file.

The entry for 3-21G in 'pople':
name= 3-21G files= 3-21G.BAS atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs
This simple seems to be a list over the files that describe the basis set and the elements supported. Can be autogenerated using a script.

Intermission: polarization and diffuse orbitals, and ECP.
At this stage it's pretty simple. We now have a rough idea of what's needed. We just need to understand how to expand our basis sets.

For 3-21G* and 3-21++G* the polarisation and diffuse orbitals are separated into 3-21GS-AGG.BAS and 3-21GS.BAS, and 3-21PPGS-AGG.BAS and 3-21GS.BAS, and POPLDIFF.BAS. All -AGG.BAS files are empty, so I'm not sure why they are there.

Anyway, this might make it a bit clearer:
3-21G = 3-21G.BAS 3-21G* = 3-21G.BAS + 3-21GS.BAS 3-21++G* = 3-21G.BAS + 3-21GS.BAS + POPLDIFF
What happens to e.g. pople is this:
name= 3-21G* files= 3-21GS-AGG.BAS 3-21G.BAS 3-21GS.BAS atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar atoms= Na Mg Al Si P S Cl Ar name= 3-21++G* files= 3-21PPGS-AGG.BAS 3-21G.BAS POPLDIFF.BAS 3-21GS.BAS atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar atoms= H Li Be B C N O F Ne Na Mg Al Si P S Cl Ar atoms= Na Mg Al Si P S Cl
The -AGG.BAS files are empty. The first atoms line corresponds to entries in 3-21G.BAS, while for 3-21G* the second one corresponds to entries in 3-21GS.BAS. Likewise,
atoms= H Li Be B C N O F Ne Na Mg Al Si P S Cl Ar
are entries in POPLDIFF.BAS.

The good news: it's almost identical when it comes to ECP. Here's the ECPOrbital entry for LANL2DZ:
name= LANL2DZ ECP files= LANL2DZ.BAS LANL2DZ.POT atoms= H Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Pb Bi U Np Pu atoms= Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Pb Bi U Np Pu
and the ecp entry:
name= LANL2DZ ECP files= LANL2DZ.POT atoms= Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Pb Bi U Np Pu

The POT file is a little bit different from the .BAS file:
atom=Na ncore=10 lmax=2 ecp_potential%l=2%shell=d potential%num_exponents=5 1 175.5502590 -10.0000000 2 35.0516791 -47.4902024 2 7.9060270 -17.2283007 2 2.3365719 -6.0637782 2 0.7799867 -0.7299393 ecp_potential%l=0%shell=s-d potential%num_exponents=5 0 243.3605846 3.0000000 1 41.5764759 36.2847626 2 13.2649167 72.9304880 2 3.6797165 23.8401151 2 0.9764209 6.0123861 ecp_potential%l=1%shell=p-d potential%num_exponents=6 0 1257.2650682 5.0000000 1 189.6248810 117.4495683 2 54.5247759 423.3986704 2 13.7449955 109.3247297 2 3.6813579 31.3701656 2 0.9461106 7.1241813

.DAV files
The good news: the .DAV/basis.dir file is empty.
The bad news: .DAV/basis.pag is a binary file.
I haven't yet figured out the exact structure of it nor the best way to auto-generate it.
I think the best illustration is to show the od -c output for a few .POT.pag files:
0000000 \b \0 371 003 354 003 345 003 340 003 325 003 312 003 302 003 0000020 233 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 002 D 0001640 A V : \0 h t t p : / / w w w . e 0001660 m s l . p n l . g o v / e c c e 0001700 : \0 M E T A D A T A \0 A U X I L 0001720 I A R Y \0 1 : c a t e g o r y \0 0001740 \0 e c p \0 1 : t y p e \0 \0 L A N 0001760 L 2 D Z E C P \0 1 : n a m e \0 0002000
0000000 \b \0 371 003 356 003 347 003 342 003 327 003 314 003 304 003 0000020 235 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 0001640 002 D A V : \0 h t t p : / / w w w 0001660 . e m s l . p n l . g o v / e c 0001700 c e : \0 M E T A D A T A \0 A U X 0001720 I L I A R Y \0 1 : c a t e g o r 0001740 y \0 \0 e c p \0 1 : t y p e \0 \0 S 0001760 B K J C E C P \0 1 : n a m e \0 0002000

Trial and error in making files for def2-svp has shown me that you can copy e.g. LANL2DZ.POT.pag to DEF2_ECP.POT.pag, and edit with vim (use binary mode -b) but that you'll need to add enough spaces to the name so that the files both end at the same place. E.g. this works:
0000000 \b \0 371 003 354 003 345 003 340 003 325 003 312 003 302 003 0000020 233 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 002 D 0001640 A V : \0 h t t p : / / w w w . e 0001660 m s l . p n l . g o v / e c c e 0001700 : \0 M E T A D A T A \0 A U X I L 0001720 I A R Y \0 1 : c a t e g o r y \0 0001740 \0 e c p \0 1 : t y p e \0 \0 d e f 0001760 2 - e c p \0 1 : n a m e \0 0002000
but this doesn't (removed a single space at the end of def2-ecp):
0000000 \b \0 371 003 354 003 345 003 340 003 325 003 312 003 302 003 0000020 233 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 002 D 0001640 A V : \0 h t t p : / / w w w . e 0001660 m s l . p n l . g o v / e c c e 0001700 : \0 M E T A D A T A \0 A U X I L 0001720 I A R Y \0 1 : c a t e g o r y \0 0001740 \0 e c p \0 1 : t y p e \0 \0 d e f 0001760 2 - e c p \0 1 : n a m e \0 0001777
Note that the names should correspond to the names of the nwchem basis sets and/or files e.g. either 3-21gs or 3-21G*. Or LANL2DZ ECP or lanl2dz_ecp.

As far as I understand the solution will lie in how WebDAV uses .pag files. I don't know anything about that just yet though.

Anyway, that's it for now. There's now enough information to write your own scripts.

454. If I had a magic wand: stuff I'd fix in ECCE

Development of ECCE isn't as rapid as it used to, and one reason may be that the cost of implementing new features may be increasingly expensive, together with resources being finite.

However, ECCE is open source since about a year. Sure, you may not be able to submit your changes without vetting to a sourceforge git repo, but the good folks at EMSL/PNNL are very open to receiving patches and possibly merging them with the main trunk if you do write them.

Go to the ECCE forum and post improvements if you have made them. The forum is here: http://www.nwchem-sw.org/index.php/Special:AWCforum/sf/id11/General_ECCE_Topics.html

NOTE: as always with open source projects, simply making demands for features and general improvements in the forum, unless very small or highly critical, is probably a bit impolite.

I've made a few abortive attempts at making improvements to ECCE, and as these efforts often go, my success has been rather mixed (I think the migration from getopts to Getop:std is my only real contribution so far). My main talent seems to be in writing posts about ECCE, which I suppose isn't entirely without merit.

However, I'd like to become more involved, even though a significant barrier is the fact that ECCE is written in at least four different languages (java, C, perl, python).

So here's my wish list, which I might be amending with time (hopefully with solutions...)

1. Bugs
* When creating a new directory, all open directories lower in rank auto-close.
Example. Say we have the directory /jobs/catalyst open/expanded. If we create the directory /jobs/transition, /jobs/catalyst and everything below will close. It can be quite annoying since you may lose track of what jobs you were running and where.

* Separate DFT maxiter from SCF maxiter
NOTE: what I wrote below is wrong. DFT;iterations N;end works as advertised in DFT, scf; maxiter N;end is ignored.
Old text: When setting up DFT jobs using the theory editor, section called SCF Convergence really only sets the DFT options. In other words, if you set SCF max iterations to 999, you're really setting the number of DFT iterations, not SCF. It would be nice to add an SCF section to DFT jobs, and rename the SCF section to DFT.

* Can't run command because too many files open
While in reality you actually don't have anything open.
Not sure exactly what causes it, but it's manifested by the ECCE animation (in the left-most button) in the ECCE menu thingy (the small window with buttons for Exit, Organizer, Builder, Viewer, Machine Browser etc.) not stopping, and no window launching. In the terminal the following is shown:
[..] exit; echo GOODBYE Unable to create a socket Error: java.net.SocketException: Too many open files

lsof |grep ecce
in one case yielded 9784 hits, most of which were of the type
java 18010 31175 me 99u REG 8,6 0 17194388 /home/me/.ecce/ecce-v6.4b/server/activemq/data/kr-store/data/hash-index-blob_ecce_url_property_data-Subscriptions java 18010 31175 me 106u REG 8,6 0 17196148 /home/me/.ecce/ecce-v6.4b/server/activemq/data/kr-store/data/hash-index-topic-data_ecce_ejs_kill java 18010 31175 me 107u REG 8,6 0 17196150 /home/me/.ecce/ecce-v6.4b/server/activemq/data/kr-store/data/hash-index-blob_ecce_ejs_kill-Subscriptions java 18010 31176 me cwd DIR 8,6 4096 16517897 /home/me/.ecce/ecce-v6.4b/apps/bin java 18010 31176 me mem REG 8,6 14014 16919807 /home/me/.ecce/ecce-v6.4b/server/activemq/bin/run.jar java 18010 31176 me mem REG 8,6 367444 16919829 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/optional/log4j-1.2.14.jar java 18010 31176 me mem REG 8,6 467331 16919826 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/optional/spring-beans-2.5.1.jar java 18010 31176 me mem REG 8,6 81403 16919823 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/activemq-console-5.1.0.jar java 18010 31176 me mem REG 8,6 455159 16919827 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/optional/spring-context-2.5.1.jar java 18010 31176 me mem REG 8,6 128051 16919828 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/optional/xbean-spring-3.3.jar java 18010 31176 me mem REG 8,6 52915 16919819 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/commons-logging-1.1.jar java 18010 31176 me mem REG 8,6 2141382 16919830 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/optional/derby- java 18010 31176 me mem REG 8,6 275073 16919825 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/optional/spring-core-2.5.1.jar java 18010 31176 me mem REG 8,6 16030 16919820 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/geronimo-j2ee-management_1.0_spec-1.0.jar java 18010 31176 me mem REG 8,6 32359 16919821 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/geronimo-jms_1.1_spec-1.1.1.jar java 18010 31176 me mem REG 8,6 2315247 16919822 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/activemq-core-5.1.0.jar java 18010 31176 me 12r REG 8,6 14014 16919807 /home/me/.ecce/ecce-v6.4b/server/activemq/bin/run.jar java 18010 31176 me 21r REG 8,6 2141382 16919830 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/optional/derby- java 18010 31176 me 22r REG 8,6 367444 16919829 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/optional/log4j-1.2.14.jar java 18010 31176 me 23r REG 8,6 467331 16919826 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/optional/spring-beans-2.5.1.jar java 18010 31176 me 24r REG 8,6 455159 16919827 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/optional/spring-context-2.5.1.jar java 18010 31176 me 25r REG 8,6 275073 16919825 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/optional/spring-core-2.5.1.jar java 18010 31176 me 26r REG 8,6 128051 16919828 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/optional/xbean-spring-3.3.jar java 18010 31176 me 27r REG 8,6 81403 16919823 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/activemq-console-5.1.0.jar java 18010 31176 me 28r REG 8,6 2315247 16919822 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/activemq-core-5.1.0.jar java 18010 31176 me 29r REG 8,6 52915 16919819 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/commons-logging-1.1.jar java 18010 31176 me 30r REG 8,6 16030 16919820 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/geronimo-j2ee-management_1.0_spec-1.0.jar java 18010 31176 me 31r REG 8,6 32359 16919821 /home/me/.ecce/ecce-v6.4b/server/activemq/lib/geronimo-jms_1.1_spec-1.1.1.jar

And it repeats. I don't know why -- I guess files are opened and never closed.

* MD bugs?
Hammering out MD related bugs. I don't understand the ECCE support for MD well enough to know what is a bug and what is a feature issue, but I do know that I've had issues getting MD sims running through simple point and click. So I'm not even sure that there are MD related bugs -- but you do get the impression that there are.

Luckily there are people putting up tutorials online: http://saccharides.blogspot.tw/2013/06/ecce-md-calculation.html

* Long lines in nwchem input
There's a bug when dealing with lines longer than 254 chars. Today (18th of June 2013) I encountered it for the first time in a situation unrelated to pasting BSE code -- this time the ecce_print line was simply too long. Read more here about the bug: http://verahill.blogspot.com.au/2013/02/347-minor-ecce-oddity-when-pasting.html

* Long lines in xterm/csh
xterm -f (invoked by alt+l) on a running calculation doesn't work is the resulting command is longer than 256 characters. e.g.
xterm -title "optimization_def2svp_boat_acetonitrile_to_start_again_g09_pcm_ts-1" -bg "#b7b8ba" -fg "#000000" -sb -geom 80x40 -e tail -f /home/calc/boron/testing/testing/catalyst/optimization_def2svp_boat_acetonitrile_to_start_again_g09_pcm_ts-1/g03.g03out

which has 257 characters doesn't work when used by ECCE (which uses (t)csh), but works fine in bash.

* Rounding in ECCE when using Coefficients and Exponents. 
NOTE: this is fixed in ECCE 7.0

When you select a bass set in ECCE you can either represent it as
basis Be library '6-31g' end
or you can check the 'Use Exponents and Coefficients' in the ECCE Editor. If you check that box you'll get something like this instead:
basis "ao basis" cartesian print Be S 1264.58570000 0.00194500 189.93681000 0.01483500 43.15908900 0.07209100 12.09866300 0.23715400 3.80632300 0.46919900 1.27289000 0.35652000 [..]
However, the 6-31G.BAS file gives the coefficients and exponents as
atom=Be contraction shell=S num_primitives=6 num_coefficients=1 1264.5857 0.0019448 189.93681 0.0148351 43.159089 0.0720906 12.098663 0.2371542 3.8063232 0.4691987 1.2728903 0.3565202
Note the higher precision (there are more extreme examples, but this one will do for demonstration).
Finally, if you define the basis set using the * library '6-31g' way, you get
Be (Beryllium) -------------- Exponent Coefficients -------------- --------------------------------------------------------- 1 S 1.26458570E+03 0.001945 1 S 1.89936810E+02 0.014835 1 S 4.31590890E+01 0.072091 1 S 1.20986630E+01 0.237154 1 S 3.80632320E+00 0.469199 1 S 1.27289030E+00 0.356520 2 S 3.19646310E+00 -0.112649 2 S 7.47813300E-01 -0.229506 2 S 2.19966300E-01 1.186917

In other words, you get somewhat different precision depending on how you use a basis set (3.8063232 vs 3.806323). Not a major issue, but it's still somewhat odd behaviour. The issue is much more significant when I use the def2 basis sets.

In terms of the energy at b3lyp/6-31g for an isolated Be atom I get this with Coeffs/Exps:
Total DFT energy = -14.668063062134 One electron energy = -19.130485660973 Coulomb energy = 7.245896594547 Exchange-Corr. energy = -2.783473995707 Nuclear repulsion energy = 0.000000000000 Numeric. integr. density = 3.999999797518 Total iterative time = 0.2s
vs this with Be library '6-31g'
Total DFT energy = -14.668063028950 One electron energy = -19.130484712269 Coulomb energy = 7.245894834880 Exchange-Corr. energy = -2.783473151561 Nuclear repulsion energy = 0.000000000000 Numeric. integr. density = 3.999999797514 Total iterative time = 0.2s

2. Features
* A script for importing/adding new basis sets, as they become supported by nwchem. I've started work on this but I'm stuck without understanding why. See part 1 here: http://verahill.blogspot.com.au/2013/06/455-adding-nwchem-basis-sets-to-ecce.html

https://sourceforge.net/projects/nwbas2ecce/ !

Note: the source distribution of ECCE 7.0 will contain a few helper scripts for importing basis sets.

* More options when setting up COSMO calculations. Currently you can set the dielectric constant, but nothing else. being able to set at a minimum rsolv and iscren would be welcome.

NOTE: this is fixed in ECCE 7.0

* Better Gaussian input/output support. This will by necessity have to be produced by someone outside PNNL as they are banned from receiving a license for Gaussian (they are considered competitors owing to the development of nwchem).

* Adding support for more computational packages, especially GAMESS US and Dalton (since they are free/open source).

* Updated documentation/more examples in documentation. The easiest solution is having more people blogging about what they are doing -- and HOW they are doing it.

3. Other
* I'd like to see ECCE included in e.g. Debian (assuming that the license is acceptable and that EMSL/PNNL allow it+). However, not only am I not proficient in making canonical debian packages, the build script for ECCE is a bit more advanced than the usual configure/make/make install ones. So it'll take someone with a bit more expertise than myself.

+one issue is that funding is often tied to being able to demonstrate that a piece of software is being used. And the easiest way to show that it's used is to retain control over downloads/encouraging people to register.

14 June 2013

453. Very Briefly: internet crud: feedreader

I do a fair bit of vanity googling, it has to be admitted. The main reason for doing so is to find links back to this blog, so that I can post those links here (i.e. provide a way for readers to see how the different posts fared when tested by others).

In addition to the usual useless SEO-related crud, like xmarks.com, website-tools.net, yourwebsite.com (seriously, those sites provides no value whatsoever) I also stumble across feedreader hits a fair bit.

And they are much more nefarious since they actually penalize your blog in terms of page rank. (now, that may only hurt your ego -- but it might also have financial implications for the big blogs)

An example.

Say you want to compile kernel 3.9, and you for some reason want to get this blog's version of it:

So the first hit is by browse.feedreader.com. I mean, I wrote the post, but the link in the first hit goes to feedreader.com

The nefarious part of this is that because I have an identical post, my post gets relegated to the 200th page -- it's penalized for being simply a copy of the feedreader.com page.

As far as I can see there is also no added value by the feedreader frame that justifies their behaviour either:

In my biased opinion the added frame detracts and is an eye-sore.

In addition, note the bottom panel ('Recent discoveries...')? This takes you to lookup.feedreader.com, which is their internal search engine. And I wouldn't be surprised if that's how they try to monetize their 'service' -- by selling rankings.

Anyway, avoid clicking on feedreader hits -- as a company they don't add anything, and if anything they hurt blogs.

452. Briefly: Wine and MIME nuisance: MS Modelling is associated with everything...

I've battled with this on and off for a long time.

And it's not problem unique to me: http://wiki.winehq.org/FAQ#head-c847a3ded88bac0e61aae0037fa7dbd4c7ae042a. The problem with that particular solution is that it's no good once the damage has happened. It's also a blanket method.

So, in a weak moment some winters ago I installed MS Modelling in Wine. I actually never use it, because there are better tools, but that's besides the point.

Unfortunately, MS Modelling has associated itself with most common (and uncommon) file types, such as .dat, .txt and .pgp:

I decided it was time to explore it in greater detail and remove all MS Modelling associations

The first step is to see how deep the rabbit hole goes:

grep "MS Modeling" $HOME/.local/share/applications/*.desktop
/home/verahill/.local/share/applications/wine-extension-3cam.desktop:Name=MS Modeling /home/verahill/.local/share/applications/wine-extension-accin.desktop:Name=MS Modeling /home/verahill/.local/share/applications/wine-extension-acx.desktop:Name=MS Modeling /home/verahill/.local/share/applications/wine-extension-arc.desktop:Name=MS Modeling [..] /home/verahill/.local/share/applications/wine-extension-xsd.desktop:Name=MS Modeling /home/verahill/.local/share/applications/wine-extension-xtd.desktop:Name=MS Modeling /home/verahill/.local/share/applications/wine-extension-xyd.desktop:Name=MS Modeling /home/verahill/.local/share/applications/wine-extension-xyz.desktop:Name=MS Modeling

85 file types. Not acceptable.

grep "MS Modeling" $HOME/.local/share/applications/*.desktop|sed 's/:/\t/g'|gawk '{print $1}'| xargs -I {} rm {}
grep "Materials Studio" $HOME/.local/share/mime/application/*.xml|sed 's/:/\t/g'|gawk '{print $1}'| xargs -I {} rm {}
update-mime-database ~/.local/share/mime

I also remove mime.cache, although I'm not sure whether that's a good idea. I haven't suffered for it yet though:
rm ~/.local/share/applications/mimeinfo.cache
update-mime-database ~/.local/share/mime

That solves the problem - temporarily.

Here's the issue: the files in $HOME/.local/share/mime/application/ get re-created when you start a wine program though.

And I think the problem is this: http://wiki.winehq.org/FileTypesIntegration

In other words, whatever associations are listed in the wine windows registry pollutes Gnome. And that's not acceptable. Anyway, let's put a stop to it using the method recommended on the Wine FAQ:

echo '[HKEY_CURRENT_USER\Software\Wine\DllOverrides] "winemenubuilder.exe"=""'> ~/.wine/disable-winemenubuilder.reg
regedit ~/.wine/disable-winemenubuilder.reg 

It seems to have worked. Opening a program in wine no longer recreates any of the files in ~/.local/share/mime/application.

451. Seahorse plugins on gnome 3.4 -- PGP encrypting/decrypting in nautilus

Once upon a time it was possible to de/encrypt in gedit, and life was good. Then GNOME 3 came along, and the seahorse plugin for edit disappeared. (presumably you might be able to write a script to use with the External Tools gedit plugin).

It re-emerged as a plugin for Nautilus instead.

I'm showing version 3.4.0 since I'm on GNOME 3.4, and who knows what API has broken in between this and 3.8...anyway, look at https://git.gnome.org/browse/seahorse-nautilus/ for different versions.

There are probably more build dependencies than the ones I'm listing.

sudo apt-get install libcryptui-dev libnautilus-extension-dev libgpgme11-dev checkinstall autoconf automake checkinstall
wget https://git.gnome.org/browse/seahorse-nautilus/snapshot/seahorse-nautilus-3.4.0.tar.gz
tar xvf seahorse-nautilus-3.4.0.tar.gz 
cd seahorse-nautilus-3.4.0/
GnuPG Version: gpg (GnuPG) 1.4.12 GPGME Version: 1.2.0 Notification Support: yes Now type `make' to compile seahorse-nautilus
sudo checkinstall --fstrans=no
- Maintainer: [ root@beryllium ] 1 - Summary: [ seahorse-nautilus 3.4.0 ] 2 - Name: [ seahorse-nautilus ] 3 - Version: [ 3.4.0 ] 4 - Release: [ 1 ] 5 - License: [ GPL ] 6 - Group: [ checkinstall ] 7 - Architecture: [ amd64 ] 8 - Source location: [ seahorse-nautilus-3.4.0 ] 9 - Alternate source location: [ ] 10 - Requires: [ ] 11 - Provides: [ seahorse-nautilus ] 12 - Conflicts: [ ] 13 - Replaces: [ ]

Open nautilus, select a text file and right click:


Although in my case I had kde-full installed, which pulled in kgpg:

If you're having other issues with decrypting, check that the mime associations are correct:

xdg-mime query filetype plaintext.file.pgp 

12 June 2013

450. Tor and Chrome on Debian

* For the Tor bundle see http://verahill.blogspot.com.au/2013/05/408-briefly-tor-on-debian-quick-option.html
* For securing your dropbox, see http://verahill.blogspot.com.au/2013/04/398-securing-your-dropbox.html
* For encrypting your filesystem with encfs, see http://verahill.blogspot.com.au/2013/05/408-briefly-tor-on-debian-quick-option.html
* For one-time passwords (OTPW), see http://verahill.blogspot.com.au/2013/04/385-otpw-connecting-from-insecure.html
* For encryption in general using PGP/GPG, OTR, SRTP for chat, email, voice and video, see http://verahill.blogspot.com.au/2013/04/381-encrypting-chat-voice-video.html
* For truecrypt with dropbox, see http://verahill.blogspot.com.au/2012/04/using-truecrypt-with-dropbox.html

Post begins:
I think it's fair to say that online privacy is in the spotlight again, temporarily,  in particular if you are not living in the US. After all, the rest of the world is offered no protection from US agencies.

There are two levels of snooping that (can) go on:
Case 1:  outright intercept of communications
In this case your emails are read, your browsing data is intercepted and your phone conversations tapped. This is the most intrusive form, and I think even in the US a warrant is required for the intercept of this type of data (whether that's too easy of difficult to get is another question entirely).

Case 2: mining of 'meta-data'
In this case data such as recipient/sender of emails, URLs that you've been visiting, and whom you have been calling/called by are collected. In addition, e.g. cell phone tower records can be collected to track your whereabouts 24/7.

While the contents of your conversations isn't known, your entire social and professional life can be charted.
As far as I understand this is what NSA has been engaging in. Likewise, knowing exactly where you are at any given point in time, a pretty detailed picture of your life can be painted.

Begin Rant
I don't have anything to hide, but I am not too keen on the government having better records of my life than I do myself. And I should be the one deciding what to share as long as the presumption of innocence holds.

Also, we're making the presumption that the government is benign, and as has been shown repeatedly, it isn't always. That goes for the US government, the UK government and just about any bloody imaginable government, and for a simple reason: the government is made up of people. In particular people who are keen on 'leading' i.e. controlling others. Even a benign despot is a despot.

There's no use being naive -- in either direction. There are legitimate reasons for clandestine organisations wanting to mine data, and there are legitimate reasons for why we should not give them a carte blanche.

Whether you use PGP/GPG or not won't affect the mining of meta-data. Nor will OTR, although it might in theory give you a somewhat better level of deniability (but not really).

Using PGP/GPG, OTR and encryption of data in general will only protect the content of your conversations, not the fact that they occurred. Not that it's easy getting people to start using encryption of their email, especially not since hotmail and gmail provided the final push into getting people to do all their email processing in the browser rather than using a more capable email client. Obviously Google would not be pleased if all communication was PGP encrypted, since this would create issues with targeted ads.

Finally, what really irks me is the fact that because John Doe won't use encryption -- or learn how to do it -- I also cannot use it. Instead we have to play according to the rules of the least technologically informed.
End Rant

Anyway. There are a few things you can do -- at least to make you feel better. Whether they have any real impact on your privacy depends on what other sources of information leakage there are in your life.

The simplest thing you can do is to do all your browsing anonymously, including setting up and checking your email. And the easiest way to do that is by using Tor.

It's easy enough to use the Tor Bundle, e.g. http://verahill.blogspot.com.au/2013/05/408-briefly-tor-on-debian-quick-option.html

However, I for some forsaken reason like using Chrome.

To set up Proxy SwitchySharp I'm following this post:

NOTE: there are many layers to managing your privacy, and you're only as anonymous as your worst habits allow you to be. I'm a pessimist -- I think it is virtually impossible to protect yourself against a determined adversary. However, trying won't hurt.

Step 0. Block cookies by default and install an ad blocker

Pretending to be anonymous won't help if you give the game away by exposing cookies that you acquired while surfing without Tor.

You'll be surprised how many websites require you to accept cookies -- however, it's up to you whether you want to put up with that. I only allow cookies with services that I've signed up to and that I trust. I refuse to allow in particular commercial sites to require cookies for me to simply visit.

In Chrome, go to Settings, Content Settings, and check:
* Block sites from setting any data
* Block third-party cookies and site data
* Clear cookies and other site and plug-in data when I close my browser

* Allow local data to be set

You may want to restrict e.g. image loading, javascript, pop-ups, plugins etc. as well. It's down to you to weight inconvenience vs privacy.

Set Cookie and Site Data exceptions manually, and make sure to distinguish between Session Only and Allow.

Also, install e.g. simple adblock:

Step 1. Install the HTTPS everywhere extension

Step 2. Install Proxy SwitchySharp

Set up a profile called Tor to use SOCK 5 with
Go to the General Tab and enable Quick Switch.
Make sure to drag both Tor and Direct Connection into the Quick Switch field.

Step 3. Install Tor and Vidalia
Add the following to your /etc/apt/sources.list
deb http://deb.torproject.org/torproject.org wheezy main

Then do
sudo apt-get update
sudo apt-get install deb.torproject.org-keyring
sudo apt-get update
sudo apt-get install vidalia

Tor should run in the background whether you start Vidalia or not.

Step 4. Prevent DNS leaks:
[for fun, do
sudo apt-get install tcpdump
sudo tcpdump -pni eth0 'port domain'
before turning off prefetching. ]

To make sure that your DNS requests aren't being read (i.e. providing meta-data to your ISP), you will need to turn of DNS pre-fetching in Chrome.

Google is sneaky about it though -- to turn off prefetching go to Settings/Under the Bonnet and uncheck "Predict network actions to improve page load performance".

[If you set up tcpdump before you'll see how suddenly the IPs and URLs stop streaming by.]

Step 5. Start Tor/Vidalia
You don't seem to be able to launch Vidalia from the terminal, so launch Vidalia from within e.g. gnome.
In fact, you probably don't have launch vidalia as Tor should be run in the background.
Then open Chrome and navigate to e.g. whatsmyip.org or ipchicken.com:

You can turn on and off the proxy by clicking on the icon in the top right corner.

Step 6. Enable private browsing:
You don't want to risk one website being able to see what another website left behind. It shouldn't happen, but it has happened in the past.

Anyway, it's easy: open an Incognito window (ctrl + shift + N).

As far as I can tell this should give you some privacy. However, the question is how effective this is in the long run since it's difficult to maintain enough discipline to prevent any information leakage to occur.