Lindqvist -- a blog about Linux and Science. Mostly.

09 May 2013

410. Compiling LAMMPS on Debian (with GPU support)

MM/MD scares me a lot -- it requires experience, expertise and intuition to set up an MD simulation properly, especially if you need to parametrise a new system. In comparison, while DFT of course can easily yield wildly inaccurate results as a function of using the wrong method/functional/basis set or by simply asking the 'wrong' question, I find it easier to understand and to implement based on previous literature (i.e. if I read the computational details in a paper I often know how to repeat the experiments. With MD I often don't).

Anyway, a friend who is an expert in the field is using LAMMPS, and learning by imitation is better than not learning at all, so I've decided to invest a little bit of time familiarizing myself with this software.

The reasons he cited are it's more barebones, and it's C++ (advantage for some, disadvantage for others), and very modular so easy to extend (he's a theoretical chemist rather than a computational one). Finally, it has GPU support. I'm not really qualified to comment one way or the other.

Compilation

Voro++
First compile voro++ which is used for Vorono tesselation.

mkdir ~/tmp
cd ~/tmp
wget http://math.lbl.gov/voro++/download/dir/voro++-0.4.5.tar.gz
tar xvf voro++-0.4.5.tar.gz
cd voro++-0.4.5/
make
sudo make install

Note that it uses optimisation level 3 which makes me nervous in general -- edit the Makefile to change to O2 if you prefer that.

OpenKIM api
Next compile the OpenKIM api. Note that you can't run make in parallel.

sudo mkdir /opt/kimdir
sudo chown $USER:$USER /opt/kimdir
cd /opt/kimdir
wget http://s3.openkim.org/openkim-api-v1.1.1.tgz
tar xvf openkim-api-v1.1.1.tgz
cd openkim-api-v1.1.1/
export KIM_DIR=`pwd`
echo "export KIM_DIR=`pwd`" >> ~/.bashrc
source ~/.bashrc
make examples
make

LAMMPS
Grab the lammps source code. You can get it directly from Sandia national labs, or via sourceforge.

sudo apt-get install openmpi-bin libopenmpi-dev fftw3-dev build-essential gfortran
mkdir ~/tmp
cd ~/tmp
wget http://aarnet.dl.sourceforge.net/project/lammps/lammps-2Feb13.tar.gz
tar xvf lammps-2Feb13.tar.gz
cd lammps-2Feb13/src/
cp MAKE/Makefile.openmpi MAKE/Makefile.verahill

Edit MAKE/Makefile.verahill

 53 FFT_PATH =
 54 FFT_LIB =       -lfftw3
 55


make verahill


text    data     bss     dec     hex filename
6111696   11448   17024 6140168  5db108 ../lmp_verahill
make[1]: Leaving directory `/opt/lammps/lammps-2Feb13/src/Obj_verahill'

This compiles a binary in src/ called lmp_verahill. Note that it only enables a few modules.

make package-status

Installed  NO: package ASPHERE
Installed  NO: package BODY
Installed  NO: package CLASS2
Installed  NO: package COLLOID
Installed  NO: package DIPOLE
Installed  NO: package FLD
Installed  NO: package GPU
Installed  NO: package GRANULAR
Installed  NO: package KIM
Installed YES: package KSPACE
Installed YES: package MANYBODY
Installed  NO: package MC
Installed  NO: package MEAM
Installed YES: package MOLECULE
Installed  NO: package OPT
Installed  NO: package PERI
Installed  NO: package POEMS
Installed  NO: package REAX
Installed  NO: package REPLICA
Installed  NO: package RIGID
Installed  NO: package SHOCK
Installed  NO: package SRD
Installed  NO: package VORONOI
Installed  NO: package XTC

Installed  NO: package USER-MISC
Installed  NO: package USER-ATC
Installed  NO: package USER-AWPMD
Installed  NO: package USER-CG-CMM
Installed  NO: package USER-COLVARS
Installed  NO: package USER-CUDA
Installed  NO: package USER-EFF
Installed  NO: package USER-OMP
Installed  NO: package USER-MOLFILE
Installed  NO: package USER-REAXC
Installed  NO: package USER-SPH

Additional packages.
To enable additional packages, after doing make verahill, do e.g.

make yes-body yes-dipole

Installing package body
Installing package dipole

Again, note that you'll need the proper dependencies installed (e.g. KIM and Voro++ -- and for KIM make sure that you've got KIM_DIR set in your ~/.bashrc as shown above) Next, compile all the libs you need. The easiest approach is to do the following (assuming you're in the src directory):

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/include/
cd ../lib/reax
make -f Makefile.gfortran

Important: Edit Makefile.lammps:


3  reax_SYSINC =
     4  reax_SYSLIB = -lgfortran #-lifcore -lsvml -lompstub -limf
     5  reax_SYSPATH = #-L/opt/intel/fce/10.0.023/lib

cd ../poems
make -f Makefile.g++
cd ../meam
make -f Makefile.gfortran
cd ../linalg
make -f Makefile.gfortran
cd ../colvars
make -f Makefile.g++
cd ../../src/
make yes-asphere yes-body yes-class2 yes-colloid yes-dipole yes-fld yes-granular yes-kim yes-mc yes-meam yes-opt yes-peri yes-poems yes-reax yes-replica yes-rigid yes-shock yes-voronoi yes-xtc

Finish by running

make clean-all
make verahill

to properly set things up.

GPU/CUDA
Make sure you've installed the CUDA toolkit -- on debian it's the nvidia-cuda-toolkit package.

I'll only show the GPU package here -- there's also USER_CUDA. Read up on the difference on your own.

Edit lib/gpu/Makefile.linux to set the correct sm value, which depends on the GPU compute capability version (you can look this up at e.g. https://developer.nvidia.com/cuda-gpus and http://www.geeks3d.com/20100606/gpu-computing-nvidia-cuda-compute-capability-comparative-table/ ).

For GPU compute capability 3.0 you set CUDA_ARCH to sm_30. If your card supports double precision, use -D_DOUBLE_DOUBLE


6 CUDA_HOME = /usr
  7 NVCC = nvcc
  8 
  9 # Tesla CUDA
 10 #CUDA_ARCH = -arch=sm_21
 11 # newer CUDA
 12 CUDA_ARCH = -arch=sm_30
 13 # older CUDA
 14 #CUDA_ARCH = -arch=sm_10 -DCUDA_PRE_THREE
 15 
 16 CUDA_PRECISION = -D_SINGLE_SINGLE
 17 CUDA_INCLUDE = -I$(CUDA_HOME)/include
 18 CUDA_LIB = -L$(CUDA_HOME)/lib

and edit Makefile.lammps


3 gpu_SYSINC =
  4 gpu_SYSLIB = -lcudart -lcuda
  5 gpu_SYSPATH = #-L/usr/local/cuda/lib64

then do

make -f Makefile.linux
cd ../../src
make yes-gpu
make verahill

More on GPU compute capability version If you use an sm_XX value which is too high, e.g. sm_30 with GeForce 210 (v 1.2) you get:

LAMMPS (2 Feb 2013)
ERROR: GPU library not compiled for this accelerator (gpu_extra.h:40)
Cuda driver error 4 in call at file 'geryon/nvd_device.h' in line 116.
*** The MPI_Abort() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.

If you use sm_12 with GF210, you get to this point.


- Using GPGPU acceleration for pppm:
-  with 1 proc(s) per device.
--------------------------------------------------------------------------
GPU 0: GeForce 210, 16 cores, 0.98/1 GB, 1.4 GHZ (Single Precision)
--------------------------------------------------------------------------

Initializing GPU and compiling on process 0...Done.
Initializing GPUs 0-1 on core 0...Done.

ERROR: Double precision is not supported on this accelerator (gpu_extra.h:42)

There's more about this in lib/gpu/README:

124 NOTE: Double precision is only supported on certain GPUs (with
125       compute capability>=1.3). If you compile the GPU library for
126       a GPU with compute capability 1.1 and 1.2, then only single
127       precision FFTs are supported, i.e. LAMMPS has to be compiled
128       with -DFFT_SINGLE. For details on configuring FFT support in
129       LAMMPS, see http://lammps.sandia.gov/doc/Section_start.html#2_2_4

To do that, edit (in this case) src/MAKE/Makefile.verahill and set -DFFT_SINGLE and make sure to link to a single precision library (I built that as part of gromacs 4.5.5. See e.g. http://verahill.blogspot.com.au/2012/03/building-gromacs-with-fftw3-and-openmpi.html):


52 FFT_INC =       -DFFT_FFTW3 -DFFT_SINGLE
 53 FFT_PATH =
 54 FFT_LIB =       /opt/fftw/fftw-3.3.2/single/lib/libfftw3f.a

and recompiling everything (make clean-all && make verahill).

Note that I am in now way implying that a GeForce 210 is a suitable test card -- if you are serious about GPU calculations then there are serious cards out there, for serious money. I'm currently designing my next compute node, and while I probably won't go the GPU route anytime soon, I'm thinking about getting a mobo with multiple PCI-E slots for multiple cards. But I really don't have much experience.

Testing
You can test it by e.g. changing directory to examples/indent

cd ../examples/indent
mpirun -n 2 ../../src/./lmp_verahill < indent.in

Installation
You can move lmp_verahill to e.g. /opt/lammps and add it to PATH for easier execution. In my particular example I did

sudo mkdir /opt/lammps
sudo chown $USER /opt/lammps
mv ~/tmp/lammps-2Feb13 /opt/lammps
ln -s /opt/lammps/lammps-2Feb13/src/lmp_verahill /opt/lammps/lammps
echo 'export PATH=$PATH:/opt/lammps' >> ~/.bashrc
source ~/.bashrc

409.B.GAMESS US with GPU support on debian wheezy --the ACML edition. This works.

Update 27/6/2013:
Please note that Kirill Berezovsky has published a series of posts on GAMESS US, including how to compile it for both CPU and GPU use. See
http://biochemicalmatters.blogspot.com.au/2013/06/gamess-us-frequently-asked-questions_26.html
http://biochemicalmatters.blogspot.ru/2013/06/gamess-us-frequently-asked-questions_1687.html
http://biochemicalmatters.blogspot.ru/2013/06/gamess-us-frequently-asked-questions_1447.html
http://biochemicalmatters.blogspot.com.au/2013/06/gamess-us-frequently-asked-questions.html

Update 21 May 2013: See the comments below this post. This approach most likely works -- what has been confusing me is the lack of reports of GPU timings in the output, but this doesn't necessarily mean that the GPU isn't being used. The poster below, using nvidia-smi, observed GPU usage, although the speed-up was not major.

Blogspot needs versioning.
I lost the entire post when it was almost complete. Screw this.

Everything compiles fine, but no GPU output during calculation.

I see no evidence of the GPU being used at any stage. Otherwise all is good -- the calcs run fine on the CPU.

Maybe someone else will have a better idea.

I looked at libcchem/aaa.readme.1st and http://combichem.blogspot.com.au/2011/02/compiling-gamess-with-cuda-gpu-support.html to get as far as I did.

Setting up gamess
Get gamess (see e.g. http://verahill.blogspot.com.au/2012/09/compiling-and-testing-gamess-us-on.html). Put gamess-current.tar.gz in ~/tmp

sudo apt-get install libboost-all-dev build-essential g++ gfortran automake nvidia-cuda-toolkit python-cheetah openmpi-bin libopenmpi-dev zlib1g-dev checkinstall
mkdir ~/tmp
cd ~/tmp
tar xvf gamess-current.tar.gz
sudo mv gamess /opt/gamess_cuda
sudo chown $USER:$USER /opt/gamess_cuda -R

ACML
Download both the 'regular' and the int64 gfortran packages from AMD:
http://developer.amd.com/tools-and-sdks/cpu-development/amd-core-math-library-acml/acml-downloads-resources/#download

tar xvf acml-5-3-1-gfortran-64bit-int64.tgz
tar xvf acml-5-3-1-gfortran-64bit.tgz
sh install-acml-5-3-1-gfortran-64bit-int64.sh

Where do you want to install ACML?  Press return to use
the default location (/opt/acml5.3.1), or enter an alternative path.
The directory will be created if it does not already exist.
> /opt/acml/acml5.3.1

sh install-acml-5-3-1-gfortran-64bit.sh

Where do you want to install ACML?  Press return to use
the default location (/opt/acml5.3.1), or enter an alternative path.
The directory will be created if it does not already exist.
> /opt/acml/acml5.3.1

You'll get something like this:

/opt/acml/acml5.3.1
|-- Doc
|-- gfortran64
|-- gfortran64_fma4
|-- gfortran64_fma4_int64
|-- gfortran64_fma4_mp
|-- gfortran64_fma4_mp_int64
|-- gfortran64_int64
|-- gfortran64_mp
|-- gfortran64_mp_int64
`-- util

where
* fma4 is for cpus with FMA4 support (use util/cpuid to check)
* int64 is for double-precision float (integer*8) I think
* mp is for openmp. For MPI do not use the _mp_ libraries!

Pick your library/ies and add them to the LD_LIBRARY_PATH, e.g.:

echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/acml/acml5.3.1/gfortran64_int64/lib' >> ~/.bashrc
source ~/.bashrc

CBLAS

cd /opt/netlib/
wget http://www.netlib.org/blas/blast-forum/cblas.tgz
tar xvf cblas.tgz
cd CBLAS/

Edit Makefile.LINUX


24   
25  BLLIB = /opt/acml/acml5.3.1/gfortran64_int64/lib/libacml.a
26  CBLIB = ../lib/cblas_$(PLAT).a
27

cp Makefile.LINUX Makefile.in
make

patching libboost

sudo su
cd /usr/include/boost
patch -p1 < /opt/gamess_cuda/libcchem/boot/
exit

Make the following changes by hand if the patch didn't work:

/usr/include/boost/mpl/aux_/integral_wrapper.hpp


47 // other compilers (e.g. MSVC) are not particulary happy about it
48 #if BOOST_WORKAROUND(__EDG_VERSION__, <= 238) || defined(__CUDACC__)
49     typedef struct AUX_WRAPPER_NAME type;

/usr/include/boost/mpl/size_t_fwd.hpp


20 
21 BOOST_MPL_AUX_ADL_BARRIER_NAMESPACE_OPEN
22 #if defined(__CUDACC__)
23    typedef std::size_t std_size_t;
24    template< std_size_t N > struct size_t;
25 #else
 26    template< std::size_t N > struct size_t;
 27 #endif
28 
29 BOOST_MPL_AUX_ADL_BARRIER_NAMESPACE_CLOSE

/usr/include/boost/mpl/size_t.hpp


19 #if defined(__CUDACC__)
20   #define AUX_WRAPPER_VALUE_TYPE std_size_t
21   #define AUX_WRAPPER_NAME size_t    
22   #define AUX_WRAPPER_PARAMS(N) std_size_t N
23 #else 
24   #define AUX_WRAPPER_VALUE_TYPE std::size_t
25   #define AUX_WRAPPER_NAME size_t
26   #define AUX_WRAPPER_PARAMS(N) std::size_t N
27 #endif
28

HDF5

mkdir ~/tmp
cd ~/tmp
wget http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.10-patch1.tar.gz
tar xvf hdf5-1.8.10-patch1.tar.gz
cd hdf5-1.8.10-patch1/
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6
./configure --prefix=/opt/gamess_cuda/hdf5 --with-pthread --enable-cxx --enable-threadsafe --enable-unsupported
make
mkdir /opt/gamess_cuda/hdf5/lib -p
mkdir /opt/gamess_cuda/hdf5/include -p
sudo checkinstall

This package will be built according to these values: 

0 -  Maintainer: [ root@neon ]
1 -  Summary: [ hdf5-cxx]
2 -  Name:    [ hdf5-1.8.10 ]
3 -  Version: [ 1.8.10-1 ]
4 -  Release: [ 1 ]
5 -  License: [ GPL ]
6 -  Group:   [ checkinstall ]
7 -  Architecture: [ amd64 ]
8 -  Source location: [ hdf5-1.8.10-patch1 ]
9 -  Alternate source location: [  ]
10 - Requires: [  ]
11 - Provides: [ hdf5-1.8.10 ]
12 - Conflicts: [  ]
13 - Replaces: [  ]

Make sure to edit the Version field since Patch-1 leads to an error (must start with digit).

LIBCCHEM
Edit /opt/gamess_cuda/libcchem/src/externals/boost/cuda/device_ptr.hpp and /opt/gamess_cuda/libcchem/rysq/src/externals/boost/cuda/device_ptr.hpp. Insert


#include <stddef.h>

somewhere at the beginning of each file.

./configure --with-gamess --with-hdf5=/opt/gamess_cuda/hdf5 CPPFLAGS="-I/opt/gamess_cuda/hdf5/include" --with-cuda=/usr --disable-openmp --prefix=/opt/gamess_cuda/libcchem --with-gpu=fermi --with-integer8 --with-cublas
make
make install

Configure GAMESS US

cd /opt/gamess_cuda
./config

please enter your target machine name: linux64
GAMESS directory? [/opt/gamess_cuda]
GAMESS build directory? [/opt/gamess_cuda]
Version? [00] 12
Please enter your choice of FORTRAN: gfortran
Please enter only the first decimal place, such as 4.1 or 4.6:  
4.6
Enter your choice of 'mkl' or 'atlas' or 'acml' or 'none': acml
enter this full pathname: /opt/acml/acml5.3.1
communication library ('sockets' or 'mpi')? mpi
Enter MPI library (impi, mvapich2, mpt, sockets): openmpi
Please enter your openmpi's location: /opt/openmpi/1.6

Compile

cd ddi/
./compddi
cd ..

Edit comp


872 #          see ~/gamess/libcchem/aaa.readme.1st for more information
873 set GPUCODE=true
874 if ($GPUCODE == true) then

and


1663 #           -fno-whole-file suppresses argument's data type checking
1664       set OPT='-O0'
1665       if (".$GMS_DEBUG_FLAGS" != .) set OPT="$GMS_DEBUG_FLAGS"

./compall

Edit lked


69 #
70 set GPUCODE=true
71 #
72 #   5. optional MPQC interface

and


958             case openmpi:
959                set MPILIBS="-L$GMS_MPI_PATH/lib"
960                set MPILIBS="$MPILIBS -lmpi -lpthread"
961                breaksw

and


1214 if ($GPUCODE == true) then
1215    echo "   Using 'libcchem' add-in C++ codes for Nvidia/CUDA GPUs."
1216    set GPU_LIBS="-L/opt/gamess_cuda/libcchem/lib -lcchem_gamess -lcchem -lrysq"
1217    set GPU_LIBS="$GPU_LIBS -lcudart -lcublas"
1218    ### GPU_LIBS="$GPU_LIBS -lcudart -lcublas"
1219    set GPU_LIBS="$GPU_LIBS /usr/lib/libboost_thread.a"
1220    set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5.a"
1221    set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5_cpp.a"
1222    set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5_hl.a"
1223    set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5.a"
1224    set GPU_LIBS="$GPU_LIBS /opt/acml/acml5.3.1/gfortran64_int64/lib/libacml.a /opt/netlib/CBLAS/lib/cblas_LINUX.a"
1225    set GPU_LIBS="$GPU_LIBS -lz"
1226    set GPU_LIBS="$GPU_LIBS -lstdc++"
1227    ### GPU_LIBS="$GPU_LIBS -lgomp"
1228    set GPU_LIBS="$GPU_LIBS -lpthread"
1229    echo "   libcchem GPU code's libraries are"
1230    echo "$GPU_LIBS"
1231 else

./lked gamess gpu.12

Run script
Create rungpu:


#!/bin/csh -v
set TARGET=mpi
set SCR=$HOME/scratch
set USERSCR=/scratch
set GMSPATH=/opt/gamess_cuda
set JOB=$1
set VERNO=$2
set NCPUS=$3
set PPN=$3
   @ NUMGPU=1
   if ($NUMGPU > 0) then
      @ NUMCPU = $NCPUS - 1
      echo libcchem kernels will use $NUMCPU cores and $NUMGPU GPUs per node...
      set echo
      setenv CCHEM_PROFILE 1
      setenv NUM_THREADS $NCPUS
      setenv GPU_DEVICES 0
      #--if ($NUMGPU == 0) setenv GPU_DEVICES -1
      #--if ($NUMGPU == 2) setenv GPU_DEVICES 0,1
      #--if ($NUMGPU == 4) setenv GPU_DEVICES 0,1,2,3
      #setenv LD_LIBRARY_PATH /share/apps/cuda/lib64:$LD_LIBRARY_PATH
      ###### LD_LIBRARY_PATH /usr/local/cuda/lib64:$LD_LIBRARY_PATH
      unset echo
   else
      echo NO GPU
      setenv GPU_DEVICES -1
   endif


if ( $JOB:r.inp == $JOB ) set JOB=$JOB:r
echo "Copying input file $JOB.inp to your run's scratch directory..."
cp $JOB.inp $SCR/$JOB.F05

setenv TRAJECT $USERSCR/$JOB.trj
setenv RESTART $USERSCR/$JOB.rst
setenv INPUT $SCR/$JOB.F05
setenv PUNCH $USERSCR/$JOB.dat
if ( -e $TRAJECT ) rm $TRAJECT
if ( -e  $PUNCH ) rm $PUNCH
if ( -e  $RESTART ) rm $RESTART
source $GMSPATH/gms-files.csh

setenv LD_LIBRARY_PATH /opt/openmpi/1.6/lib:/opt/netlib/CBLAS/lib:/opt/acml/acml5.3.1/gfortran64_int64/lib
set path= ( /opt/openmpi/1.6/bin $path )
/opt/openmpi/1.6/bin/mpiexec -n $NCPUS $GMSPATH/gamess.gpu.$VERNO.x|tee $JOB.out
cp $PUNCH .

chmod +x it to make it executable.

Add /opt/gamess_cuda to path:

echo 'export PATH=$PATH:/opt/gamess_cuda'
source ~/.bashrc

Testing

cd /opt/gamess_cuda/tests/standard
gpurun exam44 12 2

409.A.GAMESS US with GPU support on debian wheezy. This works (probably).

Update 10/05/2013: fixed libcchem compile.

Everything compiles fine and computations run fine and fast. To date there's only one other detailed step-by-step example of successful compilation of GAMESS with GPU support out there. At least based on google.

For various reasons I'm beginning to suspect that ATLAS isn't working out for me -- I've had issues getting things to converge with ATLAS, but which work fine with ACML (see post B).

I was in part following http://combichem.blogspot.com.au/2011/02/compiling-gamess-with-cuda-gpu-support.html and ./libcchem/aaa.readme.1st

This took a while to hammer out, so the write-up is a bit messy.

Set up

sudo apt-get install libboost-all-dev build-essential g++ gfortran automake nvidia-cuda-toolkit python-cheetah openmpi-bin libopenmpi-dev zlib1g-dev checkinstall
mkdir ~/tmp

Get gamess (see e.g. http://verahill.blogspot.com.au/2012/09/compiling-and-testing-gamess-us-on.html).

Put gamess-current.tar.gz in ~/tmp

cd ~/tmp
tar xvf gamess-current.tar.gz
sudo mv gamess /opt/gamess_cuda
sudo chown $USER:$USER /opt/gamess_cuda -R


Preparing Boost

Edit /usr/include/boost/mpl/aux_/integral_wrapper.hpp

47 // other compilers (e.g. MSVC) are not particulary happy about it
 48 #if BOOST_WORKAROUND(__EDG_VERSION__, <= 238) || defined(__CUDACC__)
 49     typedef struct AUX_WRAPPER_NAME type;

Edit /usr/include/boost/mpl/size_t_fwd.hpp


20 
 21 BOOST_MPL_AUX_ADL_BARRIER_NAMESPACE_OPEN
 22 #if defined(__CUDACC__)
 23    typedef std::size_t std_size_t;
 24    template< std_size_t N > struct size_t;
 25 #else
 26    template< std::size_t N > struct size_t;
 27 #endif
 28 
 29 BOOST_MPL_AUX_ADL_BARRIER_NAMESPACE_CLOSE

Edit /usr/include/boost/mpl/size_t.hpp


 19 #if defined(__CUDACC__)
 20   #define AUX_WRAPPER_VALUE_TYPE std_size_t
 21   #define AUX_WRAPPER_NAME size_t    
 22   #define AUX_WRAPPER_PARAMS(N) std_size_t N
 23 #else 
 24   #define AUX_WRAPPER_VALUE_TYPE std::size_t
 25   #define AUX_WRAPPER_NAME size_t
 26   #define AUX_WRAPPER_PARAMS(N) std::size_t N
 27 #endif
 28

HDF5
You'll have to compile that yourself for now since H5Cpp.h missing in the debian packages.(i.e. cxx support)

mkdir ~/tmp
cd ~/tmp
wget http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.10-patch1.tar.gz
tar xvf hdf5-1.8.10-patch1.tar.gz
cd hdf5-1.8.10-patch1/
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6
./configure --prefix=/opt/gamess_cuda/hdf5 --with-pthread --enable-cxx --enable-threadsafe --enable-unsupported
make
mkdir /opt/gamess_cuda/hdf5/lib -p
mkdir /opt/gamess_cuda/hdf5/include -p
sudo checkinstall

This package will be built according to these values: 

0 -  Maintainer: [ root@neon ]
1 -  Summary: [ hdf5-cxx]
2 -  Name:    [ hdf5-1.8.10 ]
3 -  Version: [ 1.8.10-1 ]
4 -  Release: [ 1 ]
5 -  License: [ GPL ]
6 -  Group:   [ checkinstall ]
7 -  Architecture: [ amd64 ]
8 -  Source location: [ hdf5-1.8.10-patch1 ]
9 -  Alternate source location: [  ]
10 - Requires: [  ]
11 - Provides: [ hdf5-1.8.10 ]
12 - Conflicts: [  ]
13 - Replaces: [  ]


Make sure to edit the Version field since Patch-1 leads to an error (must start with digit).


Openmpi 1.6

Can't remember why I ended up compiling it myself instead of using the stock debian version. From here.

sudo apt-get install build-essential gfortran
wget http://www.open-mpi.org/software/ompi/v1.6/downloads/openmpi-1.6.tar.bz2
tar xvf openmpi-1.6.tar.bz2
cd openmpi-1.6/

sudo mkdir /opt/openmpi/
sudo chown ${USER} /opt/openmpi/
./configure --prefix=/opt/openmpi/1.6/ --with-sge

make
make install



compiling libcchem

cd /opt/gamess_cuda/libcchem

edit /opt/gamess_cuda/libcchem/rysq/src/externals/boost/cuda/device_ptr.hpp

  4 #include <cstdlib>
  5 #include <iterator>
  6 #include <stddef.h>
  7 
  8 namespace boost {

Edit /opt/gamess_cuda/libcchem/src/externals/boost/cuda/device_ptr.hpp

  4 #include <cstdlib>
  5 #include <iterator>
  6 #include <stddef.h>
  7 
  8 namespace boost {
  9 namespace cuda {

./configure --with-gamess --with-hdf5=/opt/gamess_cuda/hdf5 CPPFLAGS="-I/opt/gamess_cuda/hdf5/include" --with-cuda=/usr --disable-openmp --prefix=/opt/gamess_cuda/libcchem --with-gpu=fermi --with-integer8 --with-cublas
make
make install

Configure Gamess US

Mainly follow this: http://verahill.blogspot.com.au/2012/09/compiling-and-testing-gamess-us-on.html

cd /opt/gamess_cuda
./config


please enter your target machine name: linux64

GAMESS directory? [/opt/gamess_cuda] /opt/gamess_cuda

Setting up GAMESS compile and link for GMS_TARGET=linux64
GAMESS software is located at GMS_PATH=/opt/gamess_cuda
 
Please provide the name of the build locaation.
This may be the same location as the GAMESS directory.
 
GAMESS build directory? [/home/me/tmp/gamess] 

Please provide a version number for the GAMESS executable.
This will be used as the middle part of the binary's name,
for example: gamess.00.x

Version? [00] 12r2

Please enter your choice of FORTRAN: gfortran

gfortran is very robust, so this is a wise choice.

Please type 'gfortran -dumpversion' or else 'gfortran -v' to
detect the version number of your gfortran.
This reply should be a string with at least two decimal points,
such as 4.1.2 or 4.6.1, or maybe even 4.4.2-12.
The reply may be labeled as a 'gcc' version,
but it is really your gfortran version.
Please enter only the first decimal place, such as 4.1 or 4.6:  
4.6


Enter your choice of 'mkl' or 'atlas' or 'acml' or 'none': atlas

Please enter the Atlas subdirectory on your system: /opt/ATLAS/lib

Math library 'atlas' will be taken from /opt/ATLAS
 

If you have an expensive but fast network like Infiniband (IB), and
if you have an MPI library correctly installed,
     choose 'mpi'.
 
communication library ('sockets' or 'mpi')? mpi

Enter MPI library (impi, mvapich2, mpt, sockets): openmpi



Please enter your openmpi's location: /opt/openmpi/1.6

Build Gamess US

cd /opt/gamess_cuda/ddi/
./compddi
cd ../

Edit comp


872 #          see ~/gamess/libcchem/aaa.readme.1st for more information
 873 set GPUCODE=true
 874 if ($GPUCODE == true) then

and


1663 #           -fno-whole-file suppresses argument's data type checking
1664       set OPT='-O0'
1665       if (".$GMS_DEBUG_FLAGS" != .) set OPT="$GMS_DEBUG_FLAGS"

./compall

Edit lked


69 #
  70 set GPUCODE=true
  71 #
  72 #   5. optional MPQC interface

and


958             case openmpi:
 959                set MPILIBS="-L$GMS_MPI_PATH/lib"
 960                set MPILIBS="$MPILIBS -lmpi -lpthread"
 961                breaksw

and


1214 if ($GPUCODE == true) then
1215    echo "   Using 'libcchem' add-in C++ codes for Nvidia/CUDA GPUs."
1216    set GPU_LIBS="-L/opt/gamess_cuda/libcchem/lib -lcchem_gamess -lcchem -lrysq"
1217    set GPU_LIBS="$GPU_LIBS -lcudart -lcublas"
1218    ### GPU_LIBS="$GPU_LIBS -lcudart -lcublas"
1219    set GPU_LIBS="$GPU_LIBS /usr/lib/libboost_thread.a"
1220    set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5.a"
1221    set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5_cpp.a"
1222    set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5_hl.a"
1223    set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5.a"
1224    set GPU_LIBS="$GPU_LIBS /opt/ATLAS/lib/libcblas.a"
1225    set GPU_LIBS="$GPU_LIBS -lz"
1226    set GPU_LIBS="$GPU_LIBS -lstdc++"
1227    ### GPU_LIBS="$GPU_LIBS -lgomp"
1228    set GPU_LIBS="$GPU_LIBS -lpthread"
1229    echo "   libcchem GPU code's libraries are"
1230    echo "$GPU_LIBS"
1231 else

./lked gamess gpu.12

Create gpurun


#!/bin/csh
set TARGET=mpi
set SCR=$HOME/scratch
set USERSCR=/scratch
set GMSPATH=/opt/gamess_cuda
set JOB=$1
set VERNO=$2
set NCPUS=$3

   @ NUMGPU=1
   if ($NUMGPU > 0) then
      @ NUMCPU = $NCPUS - 1
      echo libcchem kernels will use $NUMCPU cores and $NUMGPU GPUs per node...
      set echo
      setenv CCHEM_PROFILE 1
      setenv NUM_THREADS $NCPUS
      #--if ($NUMGPU == 0) setenv GPU_DEVICES -1
      #--if ($NUMGPU == 2) setenv GPU_DEVICES 0,1
      #--if ($NUMGPU == 4) setenv GPU_DEVICES 0,1,2,3
      #setenv LD_LIBRARY_PATH /share/apps/cuda/lib64:$LD_LIBRARY_PATH
      ###### LD_LIBRARY_PATH /usr/local/cuda/lib64:$LD_LIBRARY_PATH
      unset echo
   else
      setenv GPU_DEVICES -1
   endif


if ( $JOB:r.inp == $JOB ) set JOB=$JOB:r
echo "Copying input file $JOB.inp to your run's scratch directory..."
cp $JOB.inp $SCR/$JOB.F05

setenv TRAJECT $USERSCR/$JOB.trj
setenv RESTART $USERSCR/$JOB.rst
setenv INPUT $SCR/$JOB.F05
setenv PUNCH $USERSCR/$JOB.dat
if ( -e $TRAJECT ) rm $TRAJECT
if ( -e  $PUNCH ) rm $PUNCH
if ( -e  $RESTART ) rm $RESTART
source $GMSPATH/gms-files.csh

setenv LD_LIBRARY_PATH /opt/openmpi/lib:$LD_LIBRARY_PATH
set path= ( /opt/openmpi/bin $path )
mpiexec -n $NCPUS $GMSPATH/gamess.gpu.$VERNO.x|tee $JOB.out
cp $PUNCH .

echo 'export PATH=$PATH:/opt/gamess_cuda' >> ~/.bashrc
source ~/.bashrc
chmod +x gpurun
cd test/standard/
 gpurun exam44 12 2

The only evidence of GPU usage in the output is e.g. in exam44.out:

388           -----------------------
389           MP2 CONTROL INFORMATION
390           -----------------------
391           NACORE =        6  NBCORE =        6
392           LMOMP2 =        F  AOINTS = DUP
393           METHOD =        2  NWORD  =               0
394           MP2PRP =        F  OSPT   = NONE
395           CUTOFF = 1.00E-09  CPHFBS = BASISAO
396           CODE   = GPU
397 
398           NUMBER OF CORE -A-  ORBITALS =     6
399           NUMBER OF CORE -B-  ORBITALS =     6

but in the summary only CPU utilisation is mentioned.

I modified rungms:

me@neon:/opt/gamess_cuda/tests/standard$ diff /opt/gamess_cuda/gpurungms /opt/gamess/rungms 
59,62c59,62
< set TARGET=mpi
< set SCR=$HOME/scratch
< set USERSCR=/scratch
< set GMSPATH=/opt/gamess_cuda
---
> set TARGET=sockets
> set SCR=/scr/$USER
> set USERSCR=~$USER/scr
> set GMSPATH=/u1/mike/gamess
67d66
< set NNODES=1
513c512
< set PPN=$3
---
>    set PPN=$4
601c600
<          @ PPN2 = $PPN
---
>          @ PPN2 = $PPN + $PPN
742c741
<    @ NUMGPU=1
---
>    @ NUMGPU=0
752c751
< #      setenv LD_LIBRARY_PATH /share/apps/cuda/lib64:$LD_LIBRARY_PATH
---
>       setenv LD_LIBRARY_PATH /share/apps/cuda/lib64:$LD_LIBRARY_PATH
793c792,793
<       /opt/openmpi/1.6/bin/mpiexec -n $NPROCS $GMSPATH/gamess.$VERNO.x < /dev/null
---
>       mpiexec.hydra -f $PROCFILE -n $NPROCS \
>             /home/mike/gamess/gamess.$VERNO.x < /dev/null

Pages

09 May 2013

410. Compiling LAMMPS on Debian (with GPU support)

409.B.GAMESS US with GPU support on debian wheezy --the ACML edition. This works.

409.A.GAMESS US with GPU support on debian wheezy. This works (probably).

Contributors

Statcounter