10 May 2013

412. system-config-samba on debian

I don't use samba anymore, but I do remember that Ubuntu had a very simple tool for setting up samba shares (turns out it's by redhat, not canonical). While it might be better in the long run to craft your own smb.conf, system-config-samba is convenient for quickly setting things up.

Anyway, someone at forums.debian.net asked for it, which got me thinking about making a post:

sudo apt-get install build-essential gfortran checkinstall python-all-dev cdbs debhelper quilt intltool python-central rarian-compat pkg-config gnome-doc-utils samba python-libuser libuser1 python-glade2
mkdir ~/tmp
cd ~/tmp
wget https://launchpad.net/ubuntu/+archive/primary/+files/system-config-samba_1.2.63.orig.tar.gz
tar xvf system-config-samba_1.2.63.orig.tar.gz
wget https://launchpad.net/ubuntu/+archive/primary/+files/system-config-samba_1.2.63-0ubuntu5.diff.gz
gunzip system-config-samba_1.2.63-0ubuntu5.diff.gz
patch -p0 < system-config-samba_1.2.63-0ubuntu5.diff
cd system-config-samba-1.2.63/
dpkg-buildpackage -uc -us
sudo dpkg -i ../system-config-samba_1.2.63-0ubuntu5_all.deb
sudo touch /etc/libuser.conf
gksu system-config-samba

And here we go:
i.e. it at the very least reads my /etc/samba/smb.conf accurately.

To make system-config-samba show up in your gnome menus, add it via Main Menu. Make sure to include gksu in the command to launch it.


411. Attempt at OPENMP enabled NWChem 6.1.1 -- not successful...

Update 4 June 2013:
I might return to this later and have a look at how to make the parallel executable in the bin/LINUX64 folder.

Original post:
This is another addition to my growing list over unsuccessful, abandoned or only partially successful builds.
(see e.g.
http://verahill.blogspot.com.au/2013/05/409-failed-attempt-at-compiling-gamess_10.html
http://verahill.blogspot.com.au/2013/05/409a-failed-attempt-at-compiling-gamess.html
http://verahill.blogspot.com.au/2012/08/compiling-dalton-qm-on-debian-in.html
http://verahill.blogspot.com.au/2012/07/quantum-espresso-on-rocks-543-centos-56.html)

In other words -- yes, it builds. But no, it is unusable.

I can build nwchem with openmp support, and it does run in parallel -- but the wall time is enormous since most of the time only a single thread is running.

Maybe someone will read this and see what's missing, or feel inspired to make their own attempt

What I did
ACML libraries were installed as shown in e.g. http://verahill.blogspot.com.au/2013/05/409-failed-attempt-at-compiling-gamess_10.html

Nwchem was downloaded:
sudo mkdir /opt/nwchem
sudo chown $USER:$USER /opt/nwchem
cd /opt/nwchem
wget http://www.nwchem-sw.org/download.php?f=Nwchem-6.1.1-src.2012-06-27.tar.gz
tar xvf Nwchem-6.1.1-src.2012-06-27.tar.gz
cd nwchem-6.1.1-src/

Next I edited src/config/makefile.h
2363 ifdef OPTIMIZE 2364 FFLAGS = $(FOPTIONS) $(FOPTIMIZE) 2365 CFLAGS = $(COPTIONS) $(COPTIMIZE) -fopenmp 2366 else 2367 # Need FDEBUG after FOPTIONS on SOLARIS to correctly override optimization 2368 FFLAGS = $(FOPTIONS) $(FDEBUG) 2369 CFLAGS = $(COPTIONS) $(CDEBUG) -fopenmp 2370 endif 2371 INCLUDES = -I. $(LIB_INCLUDES) -I$(INCDIR) $(INCPATH) 2372 CPPFLAGS = $(INCLUDES) $(DEFINES) $(LIB_DEFINES) 2373 LDFLAGS = $(LDOPTIONS) -L$(LIBDIR) $(LIBPATH) 2374 LIBS = $(NW_MODULE_LIBS) $(CORE_LIBS) -lgomp 2375
I then built using the following build script:
export LARGE_FILES=TRUE export TCGRSH=/usr/bin/ssh export NWCHEM_TOP=`pwd` export NWCHEM_TARGET=LINUX64 export NWCHEM_MODULES="all" export PYTHONVERSION=2.7 export PYTHONHOME=/usr export BLASOPT="-L/opt/acml/acml5.3.1/gfortran64_fma4_mp_int64/lib -lacml_mp -lpthread" export USE_OPENMP=y export LIBRARY_PATH="$LIBRARY_PATH:/opt/acml/acml5.3.1/gfortran64_fma4_mp_int64/lib" cd $NWCHEM_TOP/src make clean make nwchem_config make FC=gfortran 2> make.err 1>make.log cd $NWCHEM_TOP/contrib export FC=gfortran ./getmem.nwchem
So far so good.

Where it fails
A picture is probably in order:
Note that while this is a short run, it is perfectly representative of what I'm seeing with 'real' jobs too -- I get eight threads auto-spawning (as seen by top), but only one thread is active most of the time.

Basically, most of the time only one core is running at 100% (i.e. showing as 12.5 % here since I have 8 cores), with the other cores occasionally kicking in (the 'spikes').

The wall times is 63 seconds, and the 'cpu time' is 83.1 seconds. Ideally, for a fully parallel run the cpu time should be as close to the wall time multiplied with eight for a shared run like this (but is always smaller).

As a comparison, here's an mpi-enabled binary:
Here all cores are active over most of the (short) run. The cpu time was 9.9 seconds and the wall time 11.8 seconds. For an mpi run the wall time should be as close to the cpu time as possible (but is always larger)

So it's not particularly 'parallel' in the OMP case -- but I don't know why. Maybe nwchem 6.1.1 isn't quite ready for OMP yet? I've noticed that it's one of the areas where the upcoming release is supposed to have been improved.


'profiling' with sar -- how-to
sudo apt-get install syssstat

Edit /etc/default/sysstat:
8 # will be overwritten by debconf! 9 ENABLED="true" 10
sudo service sysstat restart

Before launching the run, set sar to run in another windows and collect data before immediately launching the run you want to monitor in a different window:

sar 1 180 >> run.log
collects data every 1 seconds and repeats it 180 times (i.e. 181 seconds) and stores the data in run.log.

09 May 2013

410. Compiling LAMMPS on Debian (with GPU support)

MM/MD scares me a lot -- it requires experience, expertise and intuition to set up an MD simulation properly, especially if you need to parametrise a new system. In comparison, while DFT of course can easily yield wildly inaccurate results as a function of using the wrong method/functional/basis set or by simply asking the 'wrong' question, I find it easier to understand and to implement based on previous literature (i.e. if I read the computational details in a paper I often know how to repeat the experiments. With MD I often don't).

Anyway, a friend who is an expert in the field is using LAMMPS, and learning by imitation is better than not learning at all, so I've decided to invest a little bit of time familiarizing myself with this software.

The reasons he cited are it's more barebones, and it's C++ (advantage for some, disadvantage for others), and very modular so easy to extend (he's a theoretical chemist rather than a computational one). Finally, it has GPU support. I'm not really qualified to comment one way or the other.


Compilation

Voro++
First compile voro++ which is used for Vorono tesselation.
mkdir ~/tmp
cd ~/tmp
wget http://math.lbl.gov/voro++/download/dir/voro++-0.4.5.tar.gz
tar xvf voro++-0.4.5.tar.gz
cd voro++-0.4.5/
make
sudo make install

Note that it uses optimisation level 3 which makes me nervous in general -- edit the Makefile to change to O2 if you prefer that.

OpenKIM api
Next compile the OpenKIM api. Note that you can't run make in parallel.

sudo mkdir /opt/kimdir
sudo chown $USER:$USER /opt/kimdir
cd /opt/kimdir
wget http://s3.openkim.org/openkim-api-v1.1.1.tgz
tar xvf openkim-api-v1.1.1.tgz
cd openkim-api-v1.1.1/
export KIM_DIR=`pwd`
echo "export KIM_DIR=`pwd`" >> ~/.bashrc
source ~/.bashrc
make examples
make

LAMMPS
Grab the lammps source code. You can get it directly from Sandia national labs, or via sourceforge.

sudo apt-get install openmpi-bin libopenmpi-dev fftw3-dev build-essential gfortran
mkdir ~/tmp
cd ~/tmp
wget http://aarnet.dl.sourceforge.net/project/lammps/lammps-2Feb13.tar.gz
tar xvf lammps-2Feb13.tar.gz
cd lammps-2Feb13/src/
cp MAKE/Makefile.openmpi MAKE/Makefile.verahill
Edit MAKE/Makefile.verahill
 53 FFT_PATH =
 54 FFT_LIB =       -lfftw3
 55 

make verahill

text data bss dec hex filename 6111696 11448 17024 6140168 5db108 ../lmp_verahill make[1]: Leaving directory `/opt/lammps/lammps-2Feb13/src/Obj_verahill'
This compiles a binary in src/ called lmp_verahill. Note that it only enables a few modules.
make package-status
Installed NO: package ASPHERE Installed NO: package BODY Installed NO: package CLASS2 Installed NO: package COLLOID Installed NO: package DIPOLE Installed NO: package FLD Installed NO: package GPU Installed NO: package GRANULAR Installed NO: package KIM Installed YES: package KSPACE Installed YES: package MANYBODY Installed NO: package MC Installed NO: package MEAM Installed YES: package MOLECULE Installed NO: package OPT Installed NO: package PERI Installed NO: package POEMS Installed NO: package REAX Installed NO: package REPLICA Installed NO: package RIGID Installed NO: package SHOCK Installed NO: package SRD Installed NO: package VORONOI Installed NO: package XTC Installed NO: package USER-MISC Installed NO: package USER-ATC Installed NO: package USER-AWPMD Installed NO: package USER-CG-CMM Installed NO: package USER-COLVARS Installed NO: package USER-CUDA Installed NO: package USER-EFF Installed NO: package USER-OMP Installed NO: package USER-MOLFILE Installed NO: package USER-REAXC Installed NO: package USER-SPH
Additional packages.
 To enable additional packages, after doing make verahill, do e.g.
make yes-body yes-dipole
Installing package body Installing package dipole
Again, note that you'll need the proper dependencies installed (e.g. KIM and Voro++ -- and for KIM make sure that you've got KIM_DIR set in your ~/.bashrc as shown above) Next, compile all the libs you need. The easiest approach is to do the following (assuming you're in the src directory):

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/include/
cd ../lib/reax
make -f Makefile.gfortran

Important: Edit Makefile.lammps:
3 reax_SYSINC = 4 reax_SYSLIB = -lgfortran #-lifcore -lsvml -lompstub -limf 5 reax_SYSPATH = #-L/opt/intel/fce/10.0.023/lib
cd ../poems
make -f Makefile.g++
cd ../meam
make -f Makefile.gfortran
cd ../linalg
make -f Makefile.gfortran
cd ../colvars
make -f Makefile.g++
cd ../../src/
make yes-asphere yes-body yes-class2 yes-colloid yes-dipole yes-fld yes-granular yes-kim yes-mc yes-meam yes-opt yes-peri yes-poems yes-reax yes-replica yes-rigid yes-shock yes-voronoi yes-xtc

Finish by running
make clean-all
make verahill

to properly set things up.

GPU/CUDA
Make sure you've installed the CUDA toolkit -- on debian it's the nvidia-cuda-toolkit package.

 I'll only show the GPU package here -- there's also USER_CUDA. Read up on the difference on your own.

 Edit lib/gpu/Makefile.linux to set the correct sm value, which depends on the GPU compute capability version (you can look this up at e.g. https://developer.nvidia.com/cuda-gpus and http://www.geeks3d.com/20100606/gpu-computing-nvidia-cuda-compute-capability-comparative-table/ ).

 For GPU compute capability 3.0 you set CUDA_ARCH to sm_30. If your card supports double precision, use -D_DOUBLE_DOUBLE
6 CUDA_HOME = /usr 7 NVCC = nvcc 8 9 # Tesla CUDA 10 #CUDA_ARCH = -arch=sm_21 11 # newer CUDA 12 CUDA_ARCH = -arch=sm_30 13 # older CUDA 14 #CUDA_ARCH = -arch=sm_10 -DCUDA_PRE_THREE 15 16 CUDA_PRECISION = -D_SINGLE_SINGLE 17 CUDA_INCLUDE = -I$(CUDA_HOME)/include 18 CUDA_LIB = -L$(CUDA_HOME)/lib
and edit Makefile.lammps
3 gpu_SYSINC = 4 gpu_SYSLIB = -lcudart -lcuda 5 gpu_SYSPATH = #-L/usr/local/cuda/lib64
then do
make -f Makefile.linux
cd ../../src
make yes-gpu
make verahill

More on GPU compute capability version If you use an sm_XX value which is too high, e.g. sm_30 with GeForce 210 (v 1.2) you get:
LAMMPS (2 Feb 2013)
ERROR: GPU library not compiled for this accelerator (gpu_extra.h:40)
Cuda driver error 4 in call at file 'geryon/nvd_device.h' in line 116.
*** The MPI_Abort() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
If you use sm_12 with GF210, you get to this point.
- Using GPGPU acceleration for pppm: - with 1 proc(s) per device. -------------------------------------------------------------------------- GPU 0: GeForce 210, 16 cores, 0.98/1 GB, 1.4 GHZ (Single Precision) -------------------------------------------------------------------------- Initializing GPU and compiling on process 0...Done. Initializing GPUs 0-1 on core 0...Done. ERROR: Double precision is not supported on this accelerator (gpu_extra.h:42)
There's more about this in lib/gpu/README:
124 NOTE: Double precision is only supported on certain GPUs (with
125       compute capability>=1.3). If you compile the GPU library for
126       a GPU with compute capability 1.1 and 1.2, then only single
127       precision FFTs are supported, i.e. LAMMPS has to be compiled
128       with -DFFT_SINGLE. For details on configuring FFT support in
129       LAMMPS, see http://lammps.sandia.gov/doc/Section_start.html#2_2_4
To do that, edit (in this case) src/MAKE/Makefile.verahill and set -DFFT_SINGLE and make sure to link to a single precision library (I built that as part of gromacs 4.5.5. See e.g. http://verahill.blogspot.com.au/2012/03/building-gromacs-with-fftw3-and-openmpi.html):
52 FFT_INC = -DFFT_FFTW3 -DFFT_SINGLE 53 FFT_PATH = 54 FFT_LIB = /opt/fftw/fftw-3.3.2/single/lib/libfftw3f.a
and recompiling everything (make clean-all && make verahill).

Note that I am in now way implying that a GeForce 210 is a suitable test card -- if you are serious about GPU calculations then there are serious cards out there, for serious money. I'm currently designing my next compute node, and while I probably won't go the GPU route anytime soon, I'm thinking about getting a mobo with multiple PCI-E slots for multiple cards. But I really don't have much experience.


Testing
You can test it by e.g. changing directory to examples/indent
cd ../examples/indent
mpirun -n 2 ../../src/./lmp_verahill < indent.in

Installation
You can move lmp_verahill to e.g. /opt/lammps and add it to PATH for easier execution. In my particular example I did
sudo mkdir /opt/lammps
sudo chown $USER /opt/lammps
mv ~/tmp/lammps-2Feb13 /opt/lammps
ln -s /opt/lammps/lammps-2Feb13/src/lmp_verahill /opt/lammps/lammps
echo 'export PATH=$PATH:/opt/lammps' >> ~/.bashrc
source ~/.bashrc