Diagnosis
iotop shows
Total DISK READ: 3.48 M/s | Total DISK WRITE: 1193.67 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
25565 be/4 root 3.46 M/s 0.00 B/s 0.00 % 76.92 % find / ( -fstype nfs -o -fstype NFS -o -fstype proc -o -fstype afs -o -fstype smbfs -o -~$\)\|\(^/var/tmp$\)\|\(^/afs$\)\|\(^/amd$\)\|\(^/sfs$\)\|\(^/proc$\) ) -prune -o -print0
ps aux|grep 2556[0-9]Heading deeper down the rabbit hole:
root 25562 0.0 0.0 18620 336 ? S 12:33 0:00 /bin/sh /usr/bin/updatedb
root 25563 26.2 0.1 25996 12400 ? S 12:33 1:51 /usr/bin/sort -z -f
root 25564 0.0 0.0 4216 116 ? S 12:33 0:00 /usr/lib/locate/frcode -0
root 25565 24.2 0.0 19024 956 ? R 12:33 1:09 /usr/bin/find / ( -fstype nfs -o -fstype NFS -o -fstype proc -o -fstype afs -o -fstype smbfs -o -fstype autofs -o -fstype iso9660 -o -fstype ncpfs -o -fstype coda -o -fstype devpts -o -fstype ftpfs -o -fstype devfs -o -fstype mfs -o -fstype sysfs -o -fstype shfs -o -type d -regex \(^/tmp$\)\|\(^/usr/tmp$\)\|\(^/var/tmp$\)\|\(^/afs$\)\|\(^/amd$\)\|\(^/sfs$\)\|\(^/proc$\) ) -prune -o -print0
me@beryllium:~$ ps -p 25565 -o ppid=
25562
me@beryllium:~$ ps -p 25562 -o ppid=
25554
me@beryllium:~$ ps -p 25554 -o ppid=
25553
me@beryllium:~$ ps -p 25553 -o ppid=
25552
me@beryllium:~$ ps -p 25552 -o ppid=
4315
me@beryllium:~$ ps -p 4315 -o ppid=
1
me@beryllium:~$ ps aux|grep 4315
root 4315 0.0 0.0 26124 428 ? Ss Mar07 0:05 /usr/sbin/cron
me@beryllium:~$ ps aux|grep 25552
root 25552 0.0 0.0 64068 844 ? S 12:33 0:00 /USR/SBIN/CRON
me@beryllium:~$ ps aux|grep 25554
root 25554 0.0 0.0 18620 588 ? S 12:33 0:00 /bin/sh /usr/bin/updatedb
So, updatedb is starting 25565, which is bogging down the computer. updatedb is starting 25565, and updatedb is started as a cron job. updatedb is run in order to update the locate database, and locate is a powerful file search function -- whereas find searches on the fly, locate consults a database.
At this point its probably a good idea to mention that I have a 4 Tb system, plus four mounted NFS folders with many Gb of content.
Either way, the only thing that remains is to identify which cron job is launching updatedb:
me@beryllium:~$ egrep "updatedb" /etc/cron.*/* /etc/cron.daily/locate:# Please consult updatedb(1) and /usr/share/doc/locate/README.Debian /etc/cron.daily/locate:[ -e /usr/bin/updatedb.findutils ] || exit 0 /etc/cron.daily/locate:# filesystems which are pruned from updatedb database /etc/cron.daily/locate:# paths which are pruned from updatedb database /etc/cron.daily/locate:if [ -r /etc/updatedb.findutils.cron.local ] ; then /etc/cron.daily/locate: . /etc/updatedb.findutils.cron.local /etc/cron.daily/locate: cd / && nice -n ${NICE:-10} updatedb.findutils 2>/dev/null
Solution:
locate is a powerful command which I use frequently, but I'd be happy to change the frequency of updatedb to once per week instead of once per day, especially if running it takes hours.
sudo mv /etc/cron.daily/locate /etc/cron.weekly/locate
We can also work on excluding paths.
me@beryllium:~$ cat /etc/cron.weekly/locate |grep PRUNE PRUNEFS="NFS nfs nfs4 afs binfmt_misc proc smbfs autofs iso9660 ncpfs coda devpts ftpfs devfs mfs shfs sysfs cifs lustre_lite tmpfs usbfs udf ocfs2" PRUNEPATHS="/tmp /usr/tmp /var/tmp /afs /amd /alex /var/spool /sfs /media /var/lib/schroot/mount" export FINDOPTIONS PRUNEFS PRUNEPATHS NETPATHS LOCALUSER
So my NFS folders are already excluded through PRUNEFS, but it might be worth throwing more paths into PRUNEPATHS. In my case I'm quite happy with a full run every week.
Update: I also discovered that I'd put an updatedb job manually in /etc/crontab which was run once every three hours. The cron.daily script was run at 6 am, and so was unlikely to cause slowdown during times when I'm actually at work. Instead it was the script I'd set up myself that was the culprit.