Lindqvist -- a blog about Linux and Science. Mostly.: 409.B.GAMESS US with GPU support on debian wheezy --the ACML edition. This works.

09 May 2013

409.B.GAMESS US with GPU support on debian wheezy --the ACML edition. This works.

Update 27/6/2013:
Please note that Kirill Berezovsky has published a series of posts on GAMESS US, including how to compile it for both CPU and GPU use. See
http://biochemicalmatters.blogspot.com.au/2013/06/gamess-us-frequently-asked-questions_26.html
http://biochemicalmatters.blogspot.ru/2013/06/gamess-us-frequently-asked-questions_1687.html
http://biochemicalmatters.blogspot.ru/2013/06/gamess-us-frequently-asked-questions_1447.html
http://biochemicalmatters.blogspot.com.au/2013/06/gamess-us-frequently-asked-questions.html

Update 21 May 2013: See the comments below this post. This approach most likely works -- what has been confusing me is the lack of reports of GPU timings in the output, but this doesn't necessarily mean that the GPU isn't being used. The poster below, using nvidia-smi, observed GPU usage, although the speed-up was not major.

Blogspot needs versioning.
I lost the entire post when it was almost complete. Screw this.

Everything compiles fine, but no GPU output during calculation.

I see no evidence of the GPU being used at any stage. Otherwise all is good -- the calcs run fine on the CPU.

Maybe someone else will have a better idea.

I looked at libcchem/aaa.readme.1st and http://combichem.blogspot.com.au/2011/02/compiling-gamess-with-cuda-gpu-support.html to get as far as I did.

Setting up gamess
Get gamess (see e.g. http://verahill.blogspot.com.au/2012/09/compiling-and-testing-gamess-us-on.html). Put gamess-current.tar.gz in ~/tmp

sudo apt-get install libboost-all-dev build-essential g++ gfortran automake nvidia-cuda-toolkit python-cheetah openmpi-bin libopenmpi-dev zlib1g-dev checkinstall
mkdir ~/tmp
cd ~/tmp
tar xvf gamess-current.tar.gz
sudo mv gamess /opt/gamess_cuda
sudo chown $USER:$USER /opt/gamess_cuda -R

ACML
Download both the 'regular' and the int64 gfortran packages from AMD:
http://developer.amd.com/tools-and-sdks/cpu-development/amd-core-math-library-acml/acml-downloads-resources/#download

tar xvf acml-5-3-1-gfortran-64bit-int64.tgz
tar xvf acml-5-3-1-gfortran-64bit.tgz
sh install-acml-5-3-1-gfortran-64bit-int64.sh

Where do you want to install ACML?  Press return to use
the default location (/opt/acml5.3.1), or enter an alternative path.
The directory will be created if it does not already exist.
> /opt/acml/acml5.3.1

sh install-acml-5-3-1-gfortran-64bit.sh

Where do you want to install ACML?  Press return to use
the default location (/opt/acml5.3.1), or enter an alternative path.
The directory will be created if it does not already exist.
> /opt/acml/acml5.3.1

You'll get something like this:

/opt/acml/acml5.3.1
|-- Doc
|-- gfortran64
|-- gfortran64_fma4
|-- gfortran64_fma4_int64
|-- gfortran64_fma4_mp
|-- gfortran64_fma4_mp_int64
|-- gfortran64_int64
|-- gfortran64_mp
|-- gfortran64_mp_int64
`-- util

where
* fma4 is for cpus with FMA4 support (use util/cpuid to check)
* int64 is for double-precision float (integer*8) I think
* mp is for openmp. For MPI do not use the _mp_ libraries!

Pick your library/ies and add them to the LD_LIBRARY_PATH, e.g.:

echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/acml/acml5.3.1/gfortran64_int64/lib' >> ~/.bashrc
source ~/.bashrc

CBLAS

cd /opt/netlib/
wget http://www.netlib.org/blas/blast-forum/cblas.tgz
tar xvf cblas.tgz
cd CBLAS/

Edit Makefile.LINUX


24   
25  BLLIB = /opt/acml/acml5.3.1/gfortran64_int64/lib/libacml.a
26  CBLIB = ../lib/cblas_$(PLAT).a
27

cp Makefile.LINUX Makefile.in
make

patching libboost

sudo su
cd /usr/include/boost
patch -p1 < /opt/gamess_cuda/libcchem/boot/
exit

Make the following changes by hand if the patch didn't work:

/usr/include/boost/mpl/aux_/integral_wrapper.hpp


47 // other compilers (e.g. MSVC) are not particulary happy about it
48 #if BOOST_WORKAROUND(__EDG_VERSION__, <= 238) || defined(__CUDACC__)
49     typedef struct AUX_WRAPPER_NAME type;

/usr/include/boost/mpl/size_t_fwd.hpp


20 
21 BOOST_MPL_AUX_ADL_BARRIER_NAMESPACE_OPEN
22 #if defined(__CUDACC__)
23    typedef std::size_t std_size_t;
24    template< std_size_t N > struct size_t;
25 #else
 26    template< std::size_t N > struct size_t;
 27 #endif
28 
29 BOOST_MPL_AUX_ADL_BARRIER_NAMESPACE_CLOSE

/usr/include/boost/mpl/size_t.hpp


19 #if defined(__CUDACC__)
20   #define AUX_WRAPPER_VALUE_TYPE std_size_t
21   #define AUX_WRAPPER_NAME size_t    
22   #define AUX_WRAPPER_PARAMS(N) std_size_t N
23 #else 
24   #define AUX_WRAPPER_VALUE_TYPE std::size_t
25   #define AUX_WRAPPER_NAME size_t
26   #define AUX_WRAPPER_PARAMS(N) std::size_t N
27 #endif
28

HDF5

mkdir ~/tmp
cd ~/tmp
wget http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.10-patch1.tar.gz
tar xvf hdf5-1.8.10-patch1.tar.gz
cd hdf5-1.8.10-patch1/
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6
./configure --prefix=/opt/gamess_cuda/hdf5 --with-pthread --enable-cxx --enable-threadsafe --enable-unsupported
make
mkdir /opt/gamess_cuda/hdf5/lib -p
mkdir /opt/gamess_cuda/hdf5/include -p
sudo checkinstall

This package will be built according to these values: 

0 -  Maintainer: [ root@neon ]
1 -  Summary: [ hdf5-cxx]
2 -  Name:    [ hdf5-1.8.10 ]
3 -  Version: [ 1.8.10-1 ]
4 -  Release: [ 1 ]
5 -  License: [ GPL ]
6 -  Group:   [ checkinstall ]
7 -  Architecture: [ amd64 ]
8 -  Source location: [ hdf5-1.8.10-patch1 ]
9 -  Alternate source location: [  ]
10 - Requires: [  ]
11 - Provides: [ hdf5-1.8.10 ]
12 - Conflicts: [  ]
13 - Replaces: [  ]

Make sure to edit the Version field since Patch-1 leads to an error (must start with digit).

LIBCCHEM
Edit /opt/gamess_cuda/libcchem/src/externals/boost/cuda/device_ptr.hpp and /opt/gamess_cuda/libcchem/rysq/src/externals/boost/cuda/device_ptr.hpp. Insert


#include <stddef.h>

somewhere at the beginning of each file.

./configure --with-gamess --with-hdf5=/opt/gamess_cuda/hdf5 CPPFLAGS="-I/opt/gamess_cuda/hdf5/include" --with-cuda=/usr --disable-openmp --prefix=/opt/gamess_cuda/libcchem --with-gpu=fermi --with-integer8 --with-cublas
make
make install

Configure GAMESS US

cd /opt/gamess_cuda
./config

please enter your target machine name: linux64
GAMESS directory? [/opt/gamess_cuda]
GAMESS build directory? [/opt/gamess_cuda]
Version? [00] 12
Please enter your choice of FORTRAN: gfortran
Please enter only the first decimal place, such as 4.1 or 4.6:  
4.6
Enter your choice of 'mkl' or 'atlas' or 'acml' or 'none': acml
enter this full pathname: /opt/acml/acml5.3.1
communication library ('sockets' or 'mpi')? mpi
Enter MPI library (impi, mvapich2, mpt, sockets): openmpi
Please enter your openmpi's location: /opt/openmpi/1.6

Compile

cd ddi/
./compddi
cd ..

Edit comp


872 #          see ~/gamess/libcchem/aaa.readme.1st for more information
873 set GPUCODE=true
874 if ($GPUCODE == true) then

and


1663 #           -fno-whole-file suppresses argument's data type checking
1664       set OPT='-O0'
1665       if (".$GMS_DEBUG_FLAGS" != .) set OPT="$GMS_DEBUG_FLAGS"

./compall

Edit lked


69 #
70 set GPUCODE=true
71 #
72 #   5. optional MPQC interface

and


958             case openmpi:
959                set MPILIBS="-L$GMS_MPI_PATH/lib"
960                set MPILIBS="$MPILIBS -lmpi -lpthread"
961                breaksw

and


1214 if ($GPUCODE == true) then
1215    echo "   Using 'libcchem' add-in C++ codes for Nvidia/CUDA GPUs."
1216    set GPU_LIBS="-L/opt/gamess_cuda/libcchem/lib -lcchem_gamess -lcchem -lrysq"
1217    set GPU_LIBS="$GPU_LIBS -lcudart -lcublas"
1218    ### GPU_LIBS="$GPU_LIBS -lcudart -lcublas"
1219    set GPU_LIBS="$GPU_LIBS /usr/lib/libboost_thread.a"
1220    set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5.a"
1221    set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5_cpp.a"
1222    set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5_hl.a"
1223    set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5.a"
1224    set GPU_LIBS="$GPU_LIBS /opt/acml/acml5.3.1/gfortran64_int64/lib/libacml.a /opt/netlib/CBLAS/lib/cblas_LINUX.a"
1225    set GPU_LIBS="$GPU_LIBS -lz"
1226    set GPU_LIBS="$GPU_LIBS -lstdc++"
1227    ### GPU_LIBS="$GPU_LIBS -lgomp"
1228    set GPU_LIBS="$GPU_LIBS -lpthread"
1229    echo "   libcchem GPU code's libraries are"
1230    echo "$GPU_LIBS"
1231 else

./lked gamess gpu.12

Run script
Create rungpu:


#!/bin/csh -v
set TARGET=mpi
set SCR=$HOME/scratch
set USERSCR=/scratch
set GMSPATH=/opt/gamess_cuda
set JOB=$1
set VERNO=$2
set NCPUS=$3
set PPN=$3
   @ NUMGPU=1
   if ($NUMGPU > 0) then
      @ NUMCPU = $NCPUS - 1
      echo libcchem kernels will use $NUMCPU cores and $NUMGPU GPUs per node...
      set echo
      setenv CCHEM_PROFILE 1
      setenv NUM_THREADS $NCPUS
      setenv GPU_DEVICES 0
      #--if ($NUMGPU == 0) setenv GPU_DEVICES -1
      #--if ($NUMGPU == 2) setenv GPU_DEVICES 0,1
      #--if ($NUMGPU == 4) setenv GPU_DEVICES 0,1,2,3
      #setenv LD_LIBRARY_PATH /share/apps/cuda/lib64:$LD_LIBRARY_PATH
      ###### LD_LIBRARY_PATH /usr/local/cuda/lib64:$LD_LIBRARY_PATH
      unset echo
   else
      echo NO GPU
      setenv GPU_DEVICES -1
   endif


if ( $JOB:r.inp == $JOB ) set JOB=$JOB:r
echo "Copying input file $JOB.inp to your run's scratch directory..."
cp $JOB.inp $SCR/$JOB.F05

setenv TRAJECT $USERSCR/$JOB.trj
setenv RESTART $USERSCR/$JOB.rst
setenv INPUT $SCR/$JOB.F05
setenv PUNCH $USERSCR/$JOB.dat
if ( -e $TRAJECT ) rm $TRAJECT
if ( -e  $PUNCH ) rm $PUNCH
if ( -e  $RESTART ) rm $RESTART
source $GMSPATH/gms-files.csh

setenv LD_LIBRARY_PATH /opt/openmpi/1.6/lib:/opt/netlib/CBLAS/lib:/opt/acml/acml5.3.1/gfortran64_int64/lib
set path= ( /opt/openmpi/1.6/bin $path )
/opt/openmpi/1.6/bin/mpiexec -n $NCPUS $GMSPATH/gamess.gpu.$VERNO.x|tee $JOB.out
cp $PUNCH .

chmod +x it to make it executable.

Add /opt/gamess_cuda to path:

echo 'export PATH=$PATH:/opt/gamess_cuda'
source ~/.bashrc

Testing

cd /opt/gamess_cuda/tests/standard
gpurun exam44 12 2

8 comments:

Anonymous19 May, 2013 09:50
How do you know it's not running? I just configure gamess+libcchem, and it works for me. Notice that exam44.inp molecule is too small, and the calculation is too fast to be noticeable.

What I did instead was to use exam08.inp, change (in $CONTRL) RUNTYPE=energy, change (under $SYSTEM) remove TIMLIM and increase MEMDDI to make it bigger and also set MEMORY= higher, and finally change the molecule to something else.

I just open avogadro and drew a huge molecule (16 carbon, 1 oxygen, didn't count H), and did two comparisons. One with CPU only, one with cpu+gpu, and i got some speedup (9min vs 12min) for 6-31d basis set. I am still testing right now with bigger basis set.

It works trust me. If you have conky installed and properly configured, you can see the 'Highest CPU' to be gamess.gpu.00.x sometimes.
ReplyDelete
Replies
Anonymous21 May, 2013 09:37
I'm sorry if I misled you. What I am saying is, I do see GPU acceleration. I can confirm this. Using nvidia-smi I see my Tesla 2050 being utilized. I just don't get much faster calculation from my setup.

I only have an Opteron at 2GHz, an one Tesla 2050 attached to a x16 PCIE slot. From aaa.readme.1st, AA did get a lot of acceleration, but from his description, his system consists of a lot more GPUs and CPUs. I only have 1, and the best I got for my big MP2 energy calculation is 8 minutes for a 1-hour calculation.

That means if I have a 10-hours calculation, I'll save 80 minutes (assuming linear scaling), and that's really just 10-hour vs 11-hour calculation. Not worth much to me right now. I hope that helps.
ReplyDelete
Replies
Anonymous28 May, 2013 16:50
Ah, I take it back. You really should get GAMESS GPU to work on your system. I just unset the exclusive_thread mode for my gpu, (nvidia-smi --id=0 -c 0) and I got the same CCSD(T) calculation that AA did in the aaa.readme.1st (catnip molecule) timing cut in half from 9000+ seconds down to 5000+ seconds. This is using (8-CPU) vs (8-cpu plus 1-gpu). All I can say is WOW. It works.

I'm going to try with MP2 on the big molecule I tried earlier. Apparently, setting the compute mode to either exclusive_thread or exclusive_process slows things down. I had to set those earlier to make it work with PBS-Torque (so that no gpu is shared between jobs).

I hope this helps.
ReplyDelete
Replies

Add comment

Pages

09 May 2013

409.B.GAMESS US with GPU support on debian wheezy --the ACML edition. This works.

8 comments:

Contributors

Statcounter