11 March 2014

563. High disk i/o caused by find/sort <- updatedb

High disk I/O, leading to system slowdown, has been bothering me a lot recently. Most of the time I've simply blamed it on ECCE, and while the situation gets better when ECCE isn't running, it's still occasionally very bad.

Diagnosis

iotop shows
 Total DISK READ:       3.48 M/s | Total DISK WRITE:    1193.67 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                     
25565 be/4 root        3.46 M/s    0.00 B/s  0.00 % 76.92 % find / ( -fstype nfs -o -fstype NFS -o -fstype proc -o -fstype afs -o -fstype smbfs -o -~$\)\|\(^/var/tmp$\)\|\(^/afs$\)\|\(^/amd$\)\|\(^/sfs$\)\|\(^/proc$\) ) -prune -o -print0
ps aux|grep 2556[0-9]
root     25562  0.0  0.0  18620   336 ?        S    12:33   0:00 /bin/sh /usr/bin/updatedb

root     25563 26.2  0.1  25996 12400 ?        S    12:33   1:51 /usr/bin/sort -z -f
root     25564  0.0  0.0   4216   116 ?        S    12:33   0:00 /usr/lib/locate/frcode -0
root     25565 24.2  0.0  19024   956 ?        R    12:33   1:09 /usr/bin/find / ( -fstype nfs -o -fstype NFS -o -fstype proc -o -fstype afs -o -fstype smbfs -o -fstype autofs -o -fstype iso9660 -o -fstype ncpfs -o -fstype coda -o -fstype devpts -o -fstype ftpfs -o -fstype devfs -o -fstype mfs -o -fstype sysfs -o -fstype shfs -o -type d -regex \(^/tmp$\)\|\(^/usr/tmp$\)\|\(^/var/tmp$\)\|\(^/afs$\)\|\(^/amd$\)\|\(^/sfs$\)\|\(^/proc$\) ) -prune -o -print0
Heading deeper down the rabbit hole:
me@beryllium:~$ ps -p 25565 -o ppid=
25562
me@beryllium:~$ ps -p 25562 -o ppid=
25554
me@beryllium:~$ ps -p 25554 -o ppid=
25553
me@beryllium:~$ ps -p 25553 -o ppid=
25552
me@beryllium:~$ ps -p 25552 -o ppid=
 4315
me@beryllium:~$ ps -p 4315 -o ppid=
    1
me@beryllium:~$ ps aux|grep 4315
root      4315  0.0  0.0  26124   428 ?        Ss   Mar07   0:05 /usr/sbin/cron
me@beryllium:~$ ps aux|grep 25552
root     25552  0.0  0.0  64068   844 ?        S    12:33   0:00 /USR/SBIN/CRON
me@beryllium:~$ ps aux|grep 25554
root     25554  0.0  0.0  18620   588 ?        S    12:33   0:00 /bin/sh /usr/bin/updatedb

So, updatedb is starting 25565, which is bogging down the computer. updatedb is starting 25565, and updatedb is started as a cron job. updatedb is run in order to update the locate database, and locate is a powerful file search function -- whereas find searches on the fly, locate consults a database.

At this point its probably a good idea to mention that I have a 4 Tb system, plus four mounted NFS folders with many Gb of content.

Either way, the only thing that remains is to identify which cron job is launching updatedb:

me@beryllium:~$ egrep "updatedb" /etc/cron.*/*
/etc/cron.daily/locate:# Please consult updatedb(1) and /usr/share/doc/locate/README.Debian
/etc/cron.daily/locate:[ -e /usr/bin/updatedb.findutils ] || exit 0
/etc/cron.daily/locate:# filesystems which are pruned from updatedb database
/etc/cron.daily/locate:# paths which are pruned from updatedb database
/etc/cron.daily/locate:if [ -r /etc/updatedb.findutils.cron.local ] ; then
/etc/cron.daily/locate: . /etc/updatedb.findutils.cron.local
/etc/cron.daily/locate:  cd / && nice -n ${NICE:-10} updatedb.findutils 2>/dev/null


Solution:
locate is a powerful command which I use frequently, but I'd be happy to change the frequency of updatedb to once per week instead of once per day, especially if running it takes hours.

sudo mv /etc/cron.daily/locate /etc/cron.weekly/locate

We can also work on excluding paths.
me@beryllium:~$ cat /etc/cron.weekly/locate |grep PRUNE
PRUNEFS="NFS nfs nfs4 afs binfmt_misc proc smbfs autofs iso9660 ncpfs coda devpts ftpfs devfs mfs shfs sysfs cifs lustre_lite tmpfs usbfs udf ocfs2"
PRUNEPATHS="/tmp /usr/tmp /var/tmp /afs /amd /alex /var/spool /sfs /media /var/lib/schroot/mount"
export FINDOPTIONS PRUNEFS PRUNEPATHS NETPATHS LOCALUSER

So my NFS folders are already excluded through PRUNEFS, but it might be worth throwing more paths into PRUNEPATHS. In my case I'm quite happy with a full run every week.

Update: I also discovered that I'd put an updatedb job manually in /etc/crontab which was run once every three hours. The cron.daily script was run at 6 am, and so was unlikely to cause slowdown during times when I'm actually at work. Instead it was the script I'd set up myself that was the culprit.

06 March 2014

562. Pulling in glibc >=2.14 from testing to stable: apt-pinning

Mixing releases is dangerous and can lead to broken systems.

Having said that, increasingly a lot of programs seem to rely on glibc >=2.14, and wheezy (current stable) only has 2.13.


Apt-pinning:

Edit the following files:
/etc/apt/sources.list
deb http://ftp.iinet.net.au/debian/debian wheezy main contrib non-free
deb http://ftp.iinet.net.au/debian/debian wheezy-backports main
deb http://ftp.iinet.net.au/debian/debian jessie main contrib non-free 

/etc/apt/preferences
Package: *
Pin: release a=testing
Pin-Priority: 10

Package: *
Pin: release a=stable
Pin-Priority: 900

Then run
sudo apt-get update

Installation

Installing glibc >=2.13 from testing
sudo apt-get install -t testing libc6-dev


Every package you install takes you closer to trouble...

04 March 2014

561. b3pw91 in nwchem and g09

UPDATE: there was an error in the earlier version where I gave the wrong energy for the b3pw91 functional in nwchem. In the old version the energy I provided was very close to that of acm in nwchem rather than b3pw91 in g09.

Note that for a large molecule with a medium sized basis set (101 atoms, ca 1100 functions,  ca 2200 primitives) the energy difference between b3pw91 in g09 and b3pw91 in nwchem as defined below is 0.0124 Hartree, which is pretty big (7.8 kcal/mol), although in absolute terms it's quite small (nwchem: -6187.741840960054 Hartree. g09: -6187.75427966 Hartree).

The difference is a lot smaller for the small molecule in the example below.

Original post:
According to http://www.nwchem-sw.org/index.php/Special:AWCforum/st/id721/Are_these_definitions_correct_fo....html b3pw91 (as defined in Gaussian 09) and acm (as defined in nwchem) are identical.

Looking at the energies I've been getting, that's not true when it comes to G09 and NWCHEM.


That acm and b3pw91 are the same should be reasonable -- b3 indicates that it's Becke's 3-parameter hybrid exchange correlation functional model, which is also known as the Adiabatic Connection Method (ACM).

For historical reasons, g98 implemented the ACM as B3LYP, by using LYP instead of PW91, and using VWN_1_RPA and a few other tricks -- see section 2 in http://verahill.blogspot.com.au/2013/06/446-b3lyp-and-wah-confusion.html

Then it would stand to reason that B3PW91 would be the 'canonical' version of Becke's 3-parameter functional.

Looking at http://www.nwchem-sw.org/index.php/Release62:Density_Functional_Theory_for_Molecules acm is defined as

xc HFexch 0.2 slater 0.8 becke88 nonlocal 0.72 vwn_5 1 Perdew91 0.81

(there are several versions of VWN -- I know it's vwn_5 from the output)


Either way, using acm in a single energy calculation (no optimisation) in nwchem on a water molecule with 6-31+G* (acm/6-31+G*) gives
-76.358375905073 Hartree

G09 using B3PW91/6-31+G* (manually defined basis set so we're using the same form in both nwchem and g09) gives
 -76.3557851653 Hartree

nwchem using
xc HFexch 0.2 slater 0.8 becke88 nonlocal 0.72 vwn_5 1 Perdew91 0.81
gives
-76.358375905072 Hartree

and nwchem using
XC HFexch 0.20 slater 0.80 becke88 nonlocal 0.72 perdew91 0.81 pw91lda 1.00
obtained from http://myweb.liu.edu/~nmatsuna/gamess/refs/howto.dft.html,gives
-76.355784373093 Hartree

This last definition is thus equivalent to b3pw91 in g09.

The gaussian manual is less than helpful. In fact it is quite misleading:
"These functionals have the form devised by Becke in 1993 [Becke93a]:
A*EXSlater+(1-A)*EXHF+B*ΔEXBecke+ECVWN+C*ΔECnon-local
[..] B3LYP uses the non-local correlation provided by the LYP expression, and VWN functional III for local correlation (not functional V). [..]B3P86 specifies the same functional with the non-local correlation provided by Perdew 86, and B3PW91 specifies this functional with the non-local correlation provided by Perdew/Wang 91.


Addendum:
While I think B3PW91 should be the same as ACM in nwchem (note that nwchem does not have b3pw91 as a keyword), I decided to have a look at how different packages define b3pw91.

nwchem -- doesn't exist. Manual.

g09 (this post) -- xc HFexch 0.20 slater 0.80 becke88 nonlocal 0.72 perdew91 0.81 pw91lda 1.00

gamess US (here) -- xc HFexch 0.20 slater 0.80 becke88 nonlocal 0.72 perdew91 0.81 pw91lda 1.00

PQS (page 52, manual) Paraphrased:
"B3PW91 -- hybrid 3-parameter HF-DFT functional comprising combination of Slater local exchange, Becke nonlocal exchange, VWN 5 local correlation and PW91 nonlocal correlation together with a portion (20%) of the exact Hartree-Fock exchange (original 3-parameter hybrid recommended by Becke)". That to me sounds like ACM.

Turbomol -- not available. Manual.

Orca -- "B3PW The three-parameter hybrid version of PW91". Not informative.

molpro -- doesn't exist. manual

Dalton -- (page 285, manual).
"B3PW91 3-parameter Becke-PW91 functional, with PW91 correlation functional. Note that PW91c includes PW92c local correlation, thus only excess PW92c local correlation is required (coe cient of 0.19).
Combine HF=0.2 Slater=0.8 Becke=0.72 PW91c=0.81 PW92c=0.19"
So the local correlation is 1*PW92c= 0.81 PW91c + 0.19 PW92c. This is, I presume, is quite different from VWN.

Q-Chem -- "B3PW91 (B3 Exchange + PW91 correlation)". Not explicit enough for me.