Showing posts with label mpi. Show all posts
Showing posts with label mpi. Show all posts

24 July 2015

614. SIESTA with MPI and acml on debian jessie

One of my students might be using SIESTA for some simulations, and a first step towards that is to set it up on my cluster.

This isn't an optimised build -- right now I'm just looking at having a simple parallell build that runs.

I had a look at http://www.pa.msu.edu/people/tomanek/SIESTA-installation.html and http://pelios.csx.cam.ac.uk/~mc321/siesta.html.
 
NOTE: don't use the int64 acml or openblas BLAS libs, or you'll get SIGSEV due to invalid memory reference when running. NWChem is the complete opposite, and for some reason both the int64 and regulat acml libs have the same names. Not sure how that's supposed to work out on a system with nwchem, which needs the int64 libs.

See here for acml on debian. I've got /opt/acml/acml5.3.1/gfortran64_int64/lib in my /etc/ld.so.conf.d/acml.conf on behalf of nwchem.
 Being lazy, I opted for the debian scalapack and libblacs packages:
 
sudo apt-get install libscalapack-mpi-dev libblacs-mpi-dev libopenmpi-dev

To get the link to the SIESTA code, go to http://departments.icmab.es/leem/siesta/CodeAccess/selector.html

Then, if you're an academic, you can do:
sudo mkdir /opt/siesta
sudo chown $USER /opt/siesta
cd /opt/siesta
wget http://departments.icmab.es/leem/siesta/CodeAccess/Code/siesta-3.2-pl-5.tgz
tar xvf siesta-3.2-pl-5.tgz
cd siesta-3.2-pl-5/Obj
sh ../Src/obj_setup.sh
*** Compilation setup done. *** Remember to copy an arch.make file or run configure as: ../Src/configure [configure_options]
../Src/./configure --help
`configure' configures siesta 2.0 to adapt to many kinds of systems. Usage: ./configure [OPTION]... [VAR=VALUE]... [..] Installation directories: --prefix=PREFIX install architecture-independent files in PREFIX [/usr/local] --exec-prefix=EPREFIX install architecture-dependent files in EPREFIX [PREFIX] By default, `make install' will install all the files in `/usr/local/bin', `/usr/local/lib' etc. You can specify an installation prefix other than `/usr/local' using `--prefix', for instance `--prefix=$HOME'. [..] --enable-mpi Compile the parallel version of SIESTA --enable-debug Compile with debugging support --enable-fast Compile with best known optimization flags Optional Packages: --with-PACKAGE[=ARG] use PACKAGE [ARG=yes] --without-PACKAGE do not use PACKAGE (same as --with-PACKAGE=no) --with-netcdf=<lib> use NetCDF library --with-siesta-blas use BLAS library packaged with SIESTA --with-blas=<lib> use BLAS library --with-siesta-lapack use LAPACK library packaged with SIESTA --with-lapack=<lib> use LAPACK library --with-blacs=<lib> use BLACS library --with-scalapack=<lib> use ScaLAPACK library [..]
../Src/./configure --enable-mpi
checking build system type... x86_64-unknown-linux-gnu checking host system type... x86_64-unknown-linux-gnu [..] checking for mpifc... no checking for mpxlf... no checking for mpif90... mpif90 checking for MPI_Init... no checking for MPI_Init in -lmpi... yes [..] checking for sgemm in /opt/openblas/lib/libopenblas.so... yes checking LAPACK already linked... yes checking LAPACK includes divide-and-conquer routines... yes configure: using DC_LAPACK routines packaged with SIESTA due to bug in library. Linker flag might be needed to avoid duplicate symbols configure: creating ./config.status config.status: creating arch.make
Edit arch.make:
# # This file is part of the SIESTA package. # # Copyright (c) Fundacion General Universidad Autonoma de Madrid: # E.Artacho, J.Gale, A.Garcia, J.Junquera, P.Ordejon, D.Sanchez-Portal # and J.M.Soler, 1996- . # # Use of this software constitutes agreement with the full conditions # given in the SIESTA license, as signed by all legitimate users. # .SUFFIXES: .SUFFIXES: .f .F .o .a .f90 .F90 SIESTA_ARCH=x86_64-unknown-linux-gnu--unknown FPP= FPP_OUTPUT= FC=mpif90 RANLIB=ranlib SYS=nag SP_KIND=4 DP_KIND=8 KINDS=$(SP_KIND) $(DP_KIND) FFLAGS=-g -O2 FPPFLAGS= -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT LDFLAGS= ARFLAGS_EXTRA= FCFLAGS_fixed_f= FCFLAGS_free_f90= FPPFLAGS_fixed_F= FPPFLAGS_free_F90= BLAS_LIBS=-L/opt/acml/acml5.3.1/gfortran64/lib -lacml LAPACK_LIBS= BLACS_LIBS=-L/usr/lib -lblacs-openmpi -lblacsCinit-openmpi SCALAPACK_LIBS=-L/usr/lib -lscalapack-openmpi COMP_LIBS=dc_lapack.a NETCDF_LIBS= NETCDF_INTERFACE= MPI_LIBS= -L/usr/lib/openmpi/lib -lmpi -lmpi_f90 LIBS=$(SCALAPACK_LIBS) $(BLACS_LIBS) $(LAPACK_LIBS) $(BLAS_LIBS) $(NETCDF_LIBS) $(MPI_LIBS) -lpthread #SIESTA needs an F90 interface to MPI #This will give you SIESTA's own implementation #If your compiler vendor offers an alternative, you may change #to it here. MPI_INTERFACE=libmpi_f90.a MPI_INCLUDE=. #Dependency rules are created by autoconf according to whether #discrete preprocessing is necessary or not. .F.o: $(FC) -c $(FFLAGS) $(INCFLAGS) $(FPPFLAGS) $(FPPFLAGS_fixed_F) $< .F90.o: $(FC) -c $(FFLAGS) $(INCFLAGS) $(FPPFLAGS) $(FPPFLAGS_free_F90) $< .f.o: $(FC) -c $(FFLAGS) $(INCFLAGS) $(FCFLAGS_fixed_f) $< .f90.o: $(FC) -c $(FFLAGS) $(INCFLAGS) $(FCFLAGS_free_f90) $<
make
cd ../
ln -s Obj/siesta siesta

I added /opt/siesta/siesta-3.2-pl-5 to $PATH.

To test, edit /opt/siesta/siesta-3.2-pl-5/test.mk:
6 #SIESTA=../../../siesta 7 SIESTA=mpirun -n 2 ../../../siesta
Then
cd /opt/siesta/siesta-3.2-pl-5/Tests/h3po4_2
export LD_LIBRARY_CONFIG=/opt/acml/acml5.3.1/gfortran64/lib 
make
>>>> Running h3po4_2 test... ==> Copying pseudopotential file for H... ==> Copying pseudopotential file for O... ==> Copying pseudopotential file for P... ==> Running SIESTA as mpirun -n 2 ../../../siesta ===> SIESTA finished successfully

Also, look at work/h3po4_2.out:
* Running on    2 nodes in parallel
>> Start of run:  24-JUL-2015  21:58:13

                           ***********************       
                           *  WELCOME TO SIESTA  *       
                           ***********************       

reinit: Reading from standard input
[..]
elaps:  optical           1       0.000       0.000     0.00
  
>> End of run:  24-JUL-2015  21:58:20

09 January 2013

312. Tau + OpenMPI profiling on Debian Testing/Wheezy

Still searching for a way to easily look at the execution of parallel jobs I came across TAU: http://www.cs.uoregon.edu/Research/tau/home.php

You can download without registering, but please do register as the number of registered users tend to be important for funding and evaluation of software development in academia: http://www.cs.uoregon.edu/Research/tau/downloads.php

I'm not really sure about how to use PDT, and I've used Tau without it before without any problems.

The compilation order below is also important -- pdt won't build without libpdb.a which is generated by tau -- but you can't configure tau with -pdt if it doesn't exist.


Compiling
sudo mkdir /opt/tau
sudo chown $USER /opt/tau
cd /opt/tau

wget http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/software_werkzeuge_zur_unterstuetzung_von_programmierung_und_optimierung/otf/dateien/OTF-1.12.2salmon.tar.gz
tar xvf OTF-1.12.2salmon.tar.gz
cd /OTF-1.12.2salmon/
./configure --prefix=/opt/tau/OTF
make
make install
cd ../

wget http://tau.uoregon.edu/tau.tgz
tar xvf tau.tgz
cd tau-2.22-p1/
./configure -mpilib=/usr/lib/openmpi/lib -prefix=/opt/tau -openmp -TRACE -iowrapper -otf=/opt/tau/OTF -pthread
make install
cd ../

wget http://tau.uoregon.edu/pdt.tar.gz
tar xvf pdt.tar.gz
cd pdtoolkit-3.18.1/
./configure -prefix=/opt/tau/pdt
make
make install


cd ../tau-2.22-p1/
./configure -mpilib=/usr/lib/openmpi/lib -prefix=/opt/tau -openmp -TRACE -iowrapper -pthread -otf=/opt/tau/OTF -pdt=/opt/tau/pdt

make install


Testing
Time to try it out on something parallel.

First set the path

PATH=$PATH:/opt/tau/x86_64/bin

I used nwchem with this input file, co2.nw:
title "co nmr" geometry c 0 0 0 o 0 0 1.13 end basis * library "6-311+G*" end property shielding end dft direct grid fine mult 1 xc HFexch 0.05 slater 0.95 becke88 nonlocal 0.72 vwn_5 1 perdew91 0.81 end task dft property

and ran it using
mpirun -n 3 tau_exec nwchem co2.nw

which ends with
Total times cpu: 4.8s wall: 7.6s
It's obviously a bit too short, but will do for illustration purposes.

That generates a set of files, profile.*.0.0 -- one for each thread i.e. profile.1.0.0, profile.2.0.0 and profile.3.0.0 in this particular case. There are a lot of options for tracing, using hardware counters etc. -- see http://www.cs.uoregon.edu/Research/tau/docs/newguide/
pprof -s
Reading Profile files in profile.* FUNCTION SUMMARY (total): --------------------------------------------------------------------------------------- %Time Exclusive Inclusive #Call #Subrs Inclusive Name msec total msec usec/call --------------------------------------------------------------------------------------- 100.0 15,813 25,931 3 14276 8643959 .TAU application 18.8 4,870 4,870 10272 0 474 MPI_Barrier() 12.1 3,138 3,138 3 0 1046279 MPI_Init() 8.1 2,090 2,090 818 0 2556 MPI_Recv() 0.0 9 9 3 0 3173 MPI_Finalize() 0.0 3 3 24 0 128 MPI_Bcast() 0.0 2 2 6 0 463 MPI_Comm_dup() 0.0 1 1 790 0 2 MPI_Comm_size() 0.0 0.872 0.872 818 0 1 MPI_Send() 0.0 0.294 0.294 841 0 0 MPI_Comm_rank() 0.0 0.17 0.17 674 0 0 MPI_Get_count() 0.0 0.111 0.111 3 0 37 MPI_Comm_free() 0.0 0.026 0.026 3 0 9 MPI_Errhandler_set() 0.0 0.024 0.024 6 0 4 MPI_Group_rank() 0.0 0.02 0.02 6 0 3 MPI_Comm_compare() 0.0 0.015 0.015 4 0 4 MPI_Comm_group() 0.0 0.008 0.008 4 0 2 MPI_Group_size() 0.0 0.004 0.004 1 0 4 MPI_Group_translate_ranks() FUNCTION SUMMARY (mean): --------------------------------------------------------------------------------------- %Time Exclusive Inclusive #Call #Subrs Inclusive Name msec total msec usec/call --------------------------------------------------------------------------------------- 100.0 5,271 8,643 1 4758.67 8643959 .TAU application 18.8 1,623 1,623 3424 0 474 MPI_Barrier() 12.1 1,046 1,046 1 0 1046279 MPI_Init() 8.1 696 696 272.667 0 2556 MPI_Recv() 0.0 3 3 1 0 3173 MPI_Finalize() 0.0 1 1 8 0 128 MPI_Bcast() 0.0 0.926 0.926 2 0 463 MPI_Comm_dup() 0.0 0.436 0.436 263.333 0 2 MPI_Comm_size() 0.0 0.291 0.291 272.667 0 1 MPI_Send() 0.0 0.098 0.098 280.333 0 0 MPI_Comm_rank() 0.0 0.0567 0.0567 224.667 0 0 MPI_Get_count() 0.0 0.037 0.037 1 0 37 MPI_Comm_free() 0.0 0.00867 0.00867 1 0 9 MPI_Errhandler_set() 0.0 0.008 0.008 2 0 4 MPI_Group_rank() 0.0 0.00667 0.00667 2 0 3 MPI_Comm_compare() 0.0 0.005 0.005 1.33333 0 4 MPI_Comm_group() 0.0 0.00267 0.00267 1.33333 0 2 MPI_Group_size() 0.0 0.00133 0.00133 0.333333 0 4 MPI_Group_translate_ranks()

...which I can't pretend to understand. Reasonably, the first line would be the cpu time and the wall time (4.8 and 7.6 s vs 5,271 and 8,643 ms).

A visual representation can be had by launching paraprof:
paraprof


Now it's time to explore...

The one thing that doesn't seem to work is visualisation of the communication matrix...



Failed attempt to build with vampirtrace
sudo mkdir /opt/tau
sudo chown $USER /opt/tau
cd /opt/tau


wget http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/software_werkzeuge_zur_unterstuetzung_von_programmierung_und_optimierung/otf/dateien/OTF-1.12.2salmon.tar.gz
tar xvf OTF-1.12.2salmon.tar.gz
cd /OTF-1.12.2salmon/
./configure --prefix=/opt/tau/OTF
make
make install
cd ../


wget http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/software_werkzeuge_zur_unterstuetzung_von_programmierung_und_optimierung/vampirtrace/dateien/VampirTrace-5.14.1.tar.gz
tar xvf VampirTrace-5.14.1.tar.gz
cd VampirTrace-5.14.1/
./configure --prefix=/opt/tau/vampirtrace --with-mpi-dir=/usr/lib/openmpi/lib --with-extern-otf-dir=/opt/tau/OTF
make
make install


wget http://tau.uoregon.edu/tau.tgz
tar xvf tau.tgz
cd tau-2.22-p1/
./configure -mpilib=/usr/lib/openmpi/lib -prefix=/opt/tau -openmp -TRACE -iowrapper -otf=/opt/tau/OTF -vampirtrace=/opt/tau/vampirtrace
make install

It builds fine, but during execution of mpirun -n 2 tau_exec... I get
Error: No matching binding for 'mpi' in directory /opt/tau/x86_64/lib
Available bindings (/opt/tau/x86_64/lib):
Error: No matching binding for 'mpi' in directory /opt/tau/x86_64/lib
Available bindings (/opt/tau/x86_64/lib):
  /opt/tau/x86_64/lib/shared-disable
  /opt/tau/x86_64/lib/shared-disable

28 October 2012

268. Compiling and testing GAMESS US on debian testing (wheezy)

Update 3: 9 May 2013. Fixed a couple of mistakes e.g. related to mpi. I also switched from ATLAS to acml -- when I build with ATLAS a lot of the example inputs do not converge.

Update 2: Pietro (see posts below) identified some odd behaviour when running test exam44 in which the scf failed to converge. The (temporary) fix for that has been included in the instructions below (change line 1664 in the file 'comp') -- most likely it's only a single file which needs to be compiled with -O0, but it will take a while to identify which one that is. Having to use -O0 on a performance critical piece of software is obviously unfortunate.

Update:
I've done this on ROCKS 5.4.3/ Centos 5.6 as well. Be aware that because of the ancient version of gfortran (4.1.2) on ROCKS there will be some limitations:

   Alas, your version of gfortran does not support REAL*16,
   so relativistic integrals cannot use quadruple precision.
   Other than this, everything will work properly.
Other than that, follow the instructions below (including editing lked)

Original post:
Solvation energies using implicit solvation is a tough nut to crack. I like working with NWChem, but there's only one solvation model (COSMO) implemented, it has had a history of giving results which are wildly different (>20 kcal/mol! It's fixed now -- using b3lyp/6-311++g** with the cosmo parameters in that post I got 63.68 kcal/mol for Cl-) from that of other software packages (partly due to a bug which was fixed in 2011), and I'm still not sure how to properly use the COSMO module (is rsolv 0 a reasonable value?). Obviously, my own unfamiliarity with the method is another issue, but that's where the idea of sane defaults come in. So, time to test and compare with other models. Reading Cramer, C. J.; Truhlar, D. G. A Acc. Chem. Res. 2008, 41, 760–768 got me interested in GAMESS US again.

Gaussian is not really an attractive option for me anymore for performance reasons (caveat: as seen by me on my particular systems using precompiled binaries). Free (source code + cost) is obviously also always attractive. Being a linux sort of person also plays into it.

So, here's how to get your cluster set up for gamess US:
1. Go to http://www.msg.chem.iastate.edu/GAMES S/download/register/
Select agree, then pick your version -- in my case
GAMESS version May 1, 2012 R1 for 64 bit (x86_64 compatible) under Linux with gnu compilers

Once you've completed your order you're told you may have to wait for up to a week before your registration is approved, but I got approved in less than 24 hours.

[2. Register for GAMESSPLUS at http://comp.chem.umn.edu/license/form-user.html
Again, it may take a little while to get approved -- in my case it was less than 24 hours. Also, it seems that you don't need a separate GAMESSPLUS anymore]

3. Download gamess-current.tar.gz as per the instructions and put it in /opt/gamess (once you've created the folder)

4. If you're using AMD you're in luck -- set up acml on your system. In my case I put everything in /opt/acml/acml5.2.0

I've had bad luck with ATLAS.

5. Compile
sudo apt-get install build-essential gfortran openmpi-bin libopenmpi-dev libboost-all-dev
sudo mkdir /opt/gamess
sudo chown $USER /opt/gamess
cd /opt/gamess
tar xvf gamess-current.tar.gz 
cd gamess/

You're now ready to autoconfigure.

The lengthy autoconfigure.
 Note that
* the location of your openmpi libs may vary -- the debian libs are put in /usr/bin/openmpi/lib by default, but I'm using my own compiled version which I've put in /opt/openmpi
* gamess is linked against the static libraries by default, so if you compiled atlas as is described elsewhere on this blog, you'll be fine.

./config
This script asks a few questions, depending on your computer system,
to set up compiler names, libraries, message passing libraries,
and so forth.
 
You can quit at any time by pressing control-C, and then .
 
Please open a second window by logging into your target machine,
in case this script asks you to 'type' a command to learn something
about your system software situation.  All such extra questions will
use the word 'type' to indicate it is a command for the other window.
 
After the new window is open, please hit  to go on.

   GAMESS can compile on the following 32 bit or 64 bit machines:
axp64    - Alpha chip, native compiler, running Tru64 or Linux
cray-xt  - Cray's massively parallel system, running CNL
hpux32   - HP PA-RISC chips (old models only), running HP-UX
hpux64   - HP Intel or PA-RISC chips, running HP-UX
ibm32    - IBM (old models only), running AIX
ibm64    - IBM, Power3 chip or newer, running AIX or Linux
ibm64-sp - IBM SP parallel system, running AIX
ibm-bg   - IBM Blue Gene (P or L model), these are 32 bit systems
linux32  - Linux (any 32 bit distribution), for x86 (old systems only)
linux64  - Linux (any 64 bit distribution), for x86_64 or ia64 chips
           AMD/Intel chip Linux machines are sold by many companies
mac32    - Apple Mac, any chip, running OS X 10.4 or older
mac64    - Apple Mac, any chip, running OS X 10.5 or newer
sgi32    - Silicon Graphics Inc., MIPS chip only, running Irix
sgi64    - Silicon Graphics Inc., MIPS chip only, running Irix
sun32    - Sun ultraSPARC chips (old models only), running Solaris
sun64    - Sun ultraSPARC or Opteron chips, running Solaris
win32    - Windows 32-bit (Windows XP, Vista, 7, Compute Cluster, HPC Edition)
win64    - Windows 64-bit (Windows XP, Vista, 7, Compute Cluster, HPC Edition)
winazure - Windows Azure Cloud Platform running Windows 64-bit
    type 'uname -a' to partially clarify your computer's flavor.
please enter your target machine name: linux64

Where is the GAMESS software on your system?
A typical response might be /u1/mike/gamess,
most probably the correct answer is /home/me/tmp/gamess
 
GAMESS directory? [/opt/gamess] /opt/gamess

Setting up GAMESS compile and link for GMS_TARGET=linux64
GAMESS software is located at GMS_PATH=/home/me/tmp/gamess
 
Please provide the name of the build locaation.
This may be the same location as the GAMESS directory.
 
GAMESS build directory? [/opt/gamess] /opt/gamess

Please provide a version number for the GAMESS executable.
This will be used as the middle part of the binary's name,
for example: gamess.00.x

Version? [00] 12r1

Linux offers many choices for FORTRAN compilers, including the GNU
compiler set ('g77' in old versions of Linux, or 'gfortran' in
current versions), which are included for free in Unix distributions.
 
There are also commercial compilers, namely Intel's 'ifort',
Portland Group's 'pgfortran', and Pathscale's 'pathf90'.  The last
two are not common, and aren't as well tested as the others.
 
type 'rpm -aq | grep gcc' to check on all GNU compilers, including gcc
type 'which gfortran'  to look for GNU's gfortran (a very good choice),
type 'which g77'       to look for GNU's g77,
type 'which ifort'     to look for Intel's compiler,
type 'which pgfortran' to look for Portland Group's compiler,
type 'which pathf90'   to look for Pathscale's compiler.
Please enter your choice of FORTRAN: gfortran

gfortran is very robust, so this is a wise choice.

Please type 'gfortran -dumpversion' or else 'gfortran -v' to
detect the version number of your gfortran.
This reply should be a string with at least two decimal points,
such as 4.1.2 or 4.6.1, or maybe even 4.4.2-12.
The reply may be labeled as a 'gcc' version,
but it is really your gfortran version.
Please enter only the first decimal place, such as 4.1 or 4.6:  
4.6

   Good, the newest gfortran can compile REAL*16 data type.
hit <return> to continue to the math library setup.

Linux distributions do not include a standard math library.
 
There are several reasonable add-on library choices,
       MKL from Intel           for 32 or 64 bit Linux (very fast)
      ACML from AMD             for 32 or 64 bit Linux (free)
     ATLAS from www.rpmfind.net for 32 or 64 bit Linux (free)
and one very unreasonable option, namely 'none', which will use
some slow FORTRAN routines supplied with GAMESS.  Choosing 'none'
will run MP2 jobs 2x slower, or CCSD(T) jobs 5x slower.
 
Some typical places (but not the only ones) to find math libraries are
Type 'ls /opt/intel/mkl'                 to look for MKL
Type 'ls /opt/intel/Compiler/mkl'        to look for MKL
Type 'ls /opt/intel/composerxe/mkl'      to look for MKL
Type 'ls -d /opt/acml*'                  to look for ACML
Type 'ls -d /usr/local/acml*'            to look for ACML
Type 'ls /usr/lib64/atlas'               to look for Atlas
 
Enter your choice of 'mkl' or 'atlas' or 'acml' or 'none': acml

Type 'ls -d /opt/acml*' or 'ls -d /usr/local/acml*'
and note the the full path, which includes a version number.
enter this full pathname: /opt/acml/acml5.2.0
Math library 'acml' will be taken from /opt/acml/acml5.2.0/gfortran64_int64/lib
please hit <return> to compile the GAMESS source code activator
gfortran -o /home/me/tmp/gamess/build/tools/actvte.x actvte.f
unset echo
Source code activator was successfully compiled.
 
please hit  to set up your network for Linux clusters.

If you have a slow network, like Gigabit Ethernet (GE), or
if you have so few nodes you won't run extensively in parallel, or
if you have no MPI library installed, or
if you want a fail-safe compile/link and easy execution,
     choose 'sockets'
to use good old reliable standard TCP/IP networking.
 
If you have an expensive but fast network like Infiniband (IB), and
if you have an MPI library correctly installed,
     choose 'mpi'.
 
communication library ('sockets' or 'mpi')? mpi

The MPI libraries which work well on linux64/Infiniband are
      Intel's MPI (impi)
      MVAPICH2
      SGI's mpt from ProPack, on Altix/ICE systems
Other libraries may work, please see 'readme.ddi' for info.
The choices listed above will compile and link easily,
and are known to run correctly and efficiently.

Enter 'sockets' if you just changed your mind about trying MPI.

Enter MPI library (impi, mvapich2, mpt, sockets): openmpi
MPI can be installed in many places, so let's find openmpi.
The person who installed your MPI can tell you where it really is.
 
impi     is probably located at a directory like
              /opt/intel/impi/3.2
              /opt/intel/impi/4.0.1.007
              /opt/intel/impi/4.0.2.003
         include iMPI's version numbers in your reply below.
mvapich2 could be almost anywhere, perhaps some directory like
              /usr/mpi/gcc/mvapich2-1.6
openmpi  could be almost anywhere, perhaps some directory like
              /usr/mpi/openmpi-1.4.3
mpt      is probably located at a directory like
              /opt/sgi/mpt/mpt-1.26
Please enter your openmpi's location: /opt/openmpi/1.6

Your configuration for GAMESS compilation is now in
     /home/me/tmp/gamess/build/install.info
Now, please follow the directions in
     /opt/gamess/machines/readme.unix



I next did this:

cd /opt/gamess/ddi
./compddi
cd ../

Edit the file 'comp' and change it from

1664       set OPT='-O2'
to
1664       set OPT='-O0'

or test case exam44.inp in tests/standard will fail due to lack of SCF convergence. (I've tried -O1 as well with no luck)

Continue your compilation:
./compall

Running 'compall' reads "install.info" which I include below:
#!/bin/csh
#   compilation configuration for GAMESS
#   generated on beryllium
#   generated at Friday 21 September  08:48:09 EST 2012
setenv GMS_PATH            /opt/gamess
setenv GMS_BUILD_DIR       /opt/gamess
#         machine type
setenv GMS_TARGET          linux64
#         FORTRAN compiler setup
setenv GMS_FORTRAN         gfortran
setenv GMS_GFORTRAN_VERNO  4.6
#         mathematical library setup
setenv GMS_MATHLIB         acml
setenv GMS_MATHLIB_PATH    /opt/acml/acml5.2.0//gfortran64_int64/lib
#         parallel message passing model setup
setenv GMS_DDI_COMM        mpi
setenv GMS_MPI_LIB         openmpi
setenv GMS_MPI_PATH        /opt/openmpi/1.6

Note that you can't change the gfortran version here either -- 4.7 won't be recognised.

Anyway, compilation will take a while -- enough for some coffee and reading.

In the next step you may have problems with openmpi -- lked looks in e.g. /opt/openmpi/1.6/lib64 but you'll probably only have /opt/openmpi/1.6/lib

Edit lked and change
 958             case openmpi:
 959                set MPILIBS="-L$GMS_MPI_PATH/lib64"
 960                set MPILIBS="$MPILIBS -lmpi"
 961                breaksw
to
 958             case openmpi:
 959                set MPILIBS="-L$GMS_MPI_PATH/lib"
 960                set MPILIBS="$MPILIBS -lmpi -lpthread"
 961                breaksw



Generate the runtime file:
./lked gamess 12r1 >&  lked.log

Done!


To compile with openblas:
1. edit install.info
#!/bin/csh
#   compilation configuration for GAMESS
#   generated on tantalum
#   generated at Friday 21 September  14:01:54 EST 2012
setenv GMS_PATH            /opt/gamess
setenv GMS_BUILD_DIR       /opt/gamess
#         machine type
setenv GMS_TARGET          linux64
#         FORTRAN compiler setup
setenv GMS_FORTRAN         gfortran
setenv GMS_GFORTRAN_VERNO  4.6
#         mathematical library setup
setenv GMS_MATHLIB         openblas
setenv GMS_MATHLIB_PATH    /opt/openblas/lib
#         parallel message passing model setup
setenv GMS_DDI_COMM        mpi
setenv GMS_MPI_LIB         openmpi
setenv GMS_MPI_PATH        /opt/openmpi/1.6

2. edit lked
Add lines 462-466 which sets up the openblas switch.

 453       endif
 454       set BLAS=' '
 455       breaksw
 456 
 457    case acml:
 458       #     do a static link so that only compile node needs to install ACML
 459       set MATHLIBS="$GMS_MATHLIB_PATH/libacml.a"
 460       set BLAS=' '
 461       breaksw
 462 case openblas:
 463        #     do a static link so that only compile node needs to install openblas
 464        set MATHLIBS="$GMS_MATHLIB_PATH/libopenblas.a"
 465        set BLAS=' '
 466        breaksw
 467 
 468    case none:
 469    default:
 470       echo "Warning.  No math library was found, you should install one."
 471       echo "    MP2 calculations speed up about 2x with a math library."
 472       echo "CCSD(T) calculations speed up about 5x with a math library."
 473       set BLAS='blas.o'
 474       set MATHLIBS=' '
 475       breaksw

3. Link
./lked gamess 12r2 >&  lked.log

You now have gamess.12r1.x which uses ATLAS, and gamess.12r2.x which uses openblas.

To run:
The rungms file was a bit too 'clever' for me, so I boiled it down to a file called gmrun which made executable (chmod +X gmrun) and put in /opt/gamess:

#!/bin/csh
set TARGET=mpi
set SCR=$HOME/scratch
set USERSCR=/scratch
set GMSPATH=/opt/gamess
set JOB=$1
set VERNO=$2
set NCPUS=$3

if ( $JOB:r.inp == $JOB ) set JOB=$JOB:r
echo "Copying input file $JOB.inp to your run's scratch directory..."
cp $JOB.inp $SCR/$JOB.F05

setenv TRAJECT $USERSCR/$JOB.trj
setenv RESTART $USERSCR/$JOB.rst
setenv INPUT $SCR/$JOB.F05
setenv PUNCH $USERSCR/$JOB.dat
if ( -e $TRAJECT ) rm $TRAJECT
if ( -e  $PUNCH ) rm $PUNCH
if ( -e  $RESTART ) rm $RESTART
source $GMSPATH/gms-files.csh

setenv LD_LIBRARY_PATH /opt/openmpi/1.6/lib:$LD_LIBRARY_PATH
set path= ( /opt/openmpi/1.6/bin $path )
/opt/openmpi/1.6/bin/mpiexec -n $NCPUS $GMSPATH/gamess.$VERNO.x|tee $JOB.out
cp $PUNCH .

Note that I actually do have two scratch directories -- one ~/scratch and one in /scratch. Note that the SCR directory should be local to the node as well as spacious, while USERSCR can be a networked,smaller directory.

Finally do
echo 'export PATH=$PATH:/opt/gamess' >> ~/.bashrc

Anyway.
Navigate to your tests/standard folder where there's a lot of exam*.inp files and do

gmrun exam12 12r1 4

where exam01 (or exam01.inp) is the name of the input file, 12r1 is the version number (that you set above) and 4 is the number of processors/threads .
 
          ---------------------
          ELECTROSTATIC MOMENTS
          ---------------------

 POINT   1           X           Y           Z (BOHR)    CHARGE
                 0.000000   -0.000000    0.000000       -0.00 (A.U.)
         DX          DY          DZ         /D/  (DEBYE)
    -0.000000    0.000000   -0.000000    0.000000
 ...... END OF PROPERTY EVALUATION ......
 CPU     0: STEP CPU TIME=     0.02 TOTAL CPU TIME=        2.2 (    0.0 MIN)
 TOTAL WALL CLOCK TIME=        2.3 SECONDS, CPU UTILIZATION IS  97.78%
  $VIB   
          IVIB=   0 IATOM=   0 ICOORD=   0 E=      -76.5841347569
 -6.175208802E-40-6.175208802E-40-4.411868660E-07 6.175208802E-40 6.175208802E-40
  4.411868660E-07-1.441225933E-40-1.441225933E-40 1.672333111E-06 1.441225933E-40
  1.441225933E-40-1.672333111E-06
 -4.053383177E-34 4.053383177E-34-2.257541709E-15
 ......END OF GEOMETRY SEARCH......
 CPU     0: STEP CPU TIME=     0.00 TOTAL CPU TIME=        2.2 (    0.0 MIN)
 TOTAL WALL CLOCK TIME=        2.3 SECONDS, CPU UTILIZATION IS  97.35%
               990473  WORDS OF DYNAMIC MEMORY USED
 EXECUTION OF GAMESS TERMINATED NORMALLY Fri Sep 21 14:27:17 2012
 DDI: 263624 bytes (0.3 MB / 0 MWords) used by master data server.

 ----------------------------------------
 CPU timing information for all processes
 ========================================
 0: 2.160 + 0.44 = 2.204
 1: 2.220 + 0.20 = 2.240
 2: 2.212 + 0.32 = 2.244
 3: 4.240 + 0.04 = 4.244
 4: 4.260 + 0.00 = 4.260
 5: 4.256 + 0.08 = 4.264
 ----------------------------------------


Done!


Looking at another test case (acetate w/ cosmo) I get the following scaling on a single node as a function of processors:


shmmax issue:
Anyone who has been using nwchem will be familiar with this
 INPUT CARD> $END                                                                           
 DDI Process 0: shmget returned an error.
 Error EINVAL: Attempting to create 160525768 bytes of shared memory.
 Check system limits on the size of SysV shared memory segments.

 The file ~/gamess/ddi/readme.ddi contains information on how to display
 the current SystemV memory settings, and how to increase their sizes.
 Increasing the setting requires the root password, and usually a sytem reboot.

 DDI Process 0: error code 911

The fix is the same. First do

cat /proc/sys/kernel/shmmax

and look at the value. Then set it to the desired value according to this post: http://verahill.blogspot.com.au/2012/04/solution-to-nwchem-shmmax-too-small.html
e.g.
sudo sysctl -w kernel.shmmax=6269961216

gfortran version issue:
Even though you likely have version 4.7.x of gfortran, pick 4.6 or you will get:

Please type 'gfortran -dumpversion' or else 'gfortran -v' to
detect the version number of your gfortran.
This reply should be a string with at least two decimal points,
such as 4.1.2 or 4.6.1, or maybe even 4.4.2-12.
The reply may be labeled as a 'gcc' version,
but it is really your gfortran version.
Please enter only the first decimal place, such as 4.1 or 4.6:  
4.7

The gfortran version number is not recognized.
It should only have one decimal place, such as 4.x

The reason is this (code from config):
      switch ($GMS_GFORTRAN_VERNO)
         case 4.1:
         case 4.2:
         case 4.3:
         case 4.4:
         case 4.5:
            echo "   Alas, your version of gfortran does not support REAL*16,"
            echo "   so relativistic integrals cannot use quadruple precision."
            echo "   Other than this, everything will work properly."
            breaksw
         case 4.6:
            echo "   Good, the newest gfortran can compile REAL*16 data type."
            breaksw
         default:
            echo "The gfortran version number is not recognized."
            echo "It should only have one decimal place, such as 4.x"
            exit 4
            breaksw
      endsw

11 September 2012

231. Compiling john the ripper: single/serial, parallel/OMP and MPI

Update: updated for v1.7.9-jumbo-7 since hccap2john in 1.7.9-jumbo-6 was broken

For no particular reason at all, here's how to compile John the Ripper on Debian Testing (Wheezy). It's very easy, and this post is probably a bit superfluous. The standard version only supports serial and parallel (OMP). See below for MPI.


The regular version: 

mkdir ~/tmp
cd ~/tmp
wget http://www.openwall.com/john/g/john-1.7.9.tar.gz
tar xvf john-1.7.9.tar.gz
cd john-1.7.9/src

If you don't edit the Makefile you build a serial/single-threaded binary.
If you want to build a threaded version for a single node with a multicore processor (OMP) do:
Edit Makefile and uncomment row 19 or 20

 18 # gcc with OpenMP
 19 OMPFLAGS = -fopenmp
 20 OMPFLAGS = -fopenmp -msse2
make clean linux-x86-64
cd ../run

You now have a binary called john in your ../run folder.


The Jumbo version:
If you want to build a distributed version with MPI (can split jobs across several nodes) you need the enhanced, community version:

sudo apt-get install openmpi-bin libopenmpi-dev
cd ~/tmp
wget http://www.openwall.com/john/g/john-1.7.9-jumbo-7.tar.gz
tar xvf john-1.7.9-jumbo-7.tar.gz 
cd john-1.7.9-jumbo-7/src

Edit the Makefile
  20 ## Uncomment the TWO lines below for MPI (can be used together with OMP as well)
  21 ## For experimental MPI_Barrier support, add -DJOHN_MPI_BARRIER too.
  22 ## For experimental MPI_Abort support, add -DJOHN_MPI_ABORT too.
  23 CC = mpicc -DHAVE_MPI
  24 MPIOBJ = john-mpi.o

and do
make clean linux-x86-64-native
cd ../run

I had a look at the passwords on one of our lab boxes -- it immediately discovered that someone had used 'password' as the password...


These test were run on my old AMD II X3 445. Processes which don't speed up with MP are highlighted in red. LM DES is borderline -- it's faster, but doesn't scale well.

Here's the single thread/serial version:
./john --test
Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts:     2906K c/s real, 2918K c/s virtual
Only one salt:  2796K c/s real, 2807K c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts:     95564 c/s real, 95948 c/s virtual
Only one salt:  93593 c/s real, 93781 c/s virtual
Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw:    14094 c/s real, 14122 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw:    918 c/s real, 919 c/s virtual
Benchmarking: Kerberos AFS DES [48/64 4K]... DONE
Short:  474316 c/s real, 475267 c/s virtual
Long:   1350K c/s real, 1356K c/s virtual
Benchmarking: LM DES [128/128 BS SSE2-16]... DONE
Raw:    39843K c/s real, 39923K c/s virtual
Benchmarking: generic crypt(3) [?/64]... DONE
Many salts:     262867 c/s real, 263393 c/s virtual
Only one salt:  260121 c/s real, 260642 c/s virtual
Benchmarking: Tripcode DES [48/64 4K]... DONE
Raw:    369843 c/s real, 370584 c/s virtual
Benchmarking: dummy [N/A]... DONE
Raw:    99512K c/s real, 99712K c/s virtual
Here's the OMP version:
Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts:     6706K c/s real, 2555K c/s virtual
Only one salt:  5015K c/s real, 2091K c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts:     205670 c/s real, 85411 c/s virtual
Only one salt:  238524 c/s real, 86720 c/s virtual
Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw:    38400 c/s real, 13812 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw:    2306 c/s real, 845 c/s virtual
Benchmarking: Kerberos AFS DES [48/64 4K]... DONE
Short:  474675 c/s real, 476581 c/s virtual
Long:   1332K c/s real, 1335K c/s virtual
Benchmarking: LM DES [128/128 BS SSE2-16]... DONE
Raw:    49046K c/s real, 16785K c/s virtual
Benchmarking: generic crypt(3) [?/64]... DONE
Many salts:     721670 c/s real, 246640 c/s virtual
Only one salt:  699168 c/s real, 239605 c/s virtual
Benchmarking: Tripcode DES [48/64 4K]... DONE
Raw:    367444 c/s real, 369657 c/s virtual
Benchmarking: dummy [N/A]... DONE
Raw:    100351K c/s real, 100552K c/s virtual
And here's the MPI version:
mpirun -n 3 ./john --test
(note that this includes a great many more tests than the default version)
Benchmarking: Traditional DES [128/128 BS SSE2-16]... (3xMPI) DONE
Many salts:     8533K c/s real, 8707K c/s virtual
Only one salt:  7705K c/s real, 8110K c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... (3xMPI) DONE
Many salts:     279808 c/s real, 282634 c/s virtual
Only one salt:  273362 c/s real, 276096 c/s virtual
Benchmarking: FreeBSD MD5 [128/128 SSE2 intrinsics 12x]... (3xMPI) DONE
Raw:    65124 c/s real, 65781 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... (3xMPI) DONE
Raw:    2722 c/s real, 2749 c/s virtual
Benchmarking: Kerberos AFS DES [48/64 4K]... (3xMPI) DONE
Short:  1387K c/s real, 1415K c/s virtual
Long:   3880K c/s real, 3959K c/s virtual

Benchmarking: LM DES [128/128 BS SSE2-16]... (3xMPI) DONERaw:    114781K c/s real, 115940K c/s virtual

I don't quite understand the Kerberos results.



Other targets of interest are:

linux-x86-64-avx         Linux, x86-64 with AVX (2011+ Intel CPUs)
linux-x86-64-xop         Linux, x86-64 with AVX and XOP (2011+ AMD CPUs)
linux-x86-64             Linux, x86-64 with SSE2 (most common)
linux-x86-avx            Linux, x86 32-bit with AVX (2011+ Intel CPUs)
linux-x86-xop            Linux, x86 32-bit with AVX and XOP (2011+ AMD CPUs)
linux-x86-sse2           Linux, x86 32-bit with SSE2 (most common, if 32-bit)
linux-x86-mmx            Linux, x86 32-bit with MMX (for old computers)
linux-x86-any            Linux, x86 32-bit (for truly ancient computers)

The FX 8150 does AVX and XOP, while my 1055T doesn't.

The community version has more options:

linux-x86-64-native      Linux, x86-64 'native' (all CPU features you've got)
linux-x86-64-gpu         Linux, x86-64 'native', CUDA and OpenCL (experimental)
linux-x86-64-opencl      Linux, x86-64 'native', OpenCL (experimental)
linux-x86-64-cuda        Linux, x86-64 'native', CUDA (experimental)
linux-x86-64-avx         Linux, x86-64 with AVX (2011+ Intel CPUs)
linux-x86-64-xop         Linux, x86-64 with AVX and XOP (2011+ AMD CPUs)
linux-x86-64[i]          Linux, x86-64 with SSE2 (most common)
linux-x86-64-icc         Linux, x86-64 compiled with icc
linux-x86-64-clang       Linux, x86-64 compiled with clang
linux-x86-gpu            Linux, x86 32-bit with SSE2, CUDA and OpenCL (experimental)
linux-x86-opencl         Linux, x86 32-bit with SSE2 and OpenCL (experimental)
linux-x86-cuda           Linux, x86 32-bit with SSE2 and CUDA (experimental)
linux-x86-sse2[i]        Linux, x86 32-bit with SSE2 (most common, 32-bit)
linux-x86-native         Linux, x86 32-bit, with all CPU features you've got (not necessarily best)
linux-x86-mmx            Linux, x86 32-bit with MMX (for old computers)
linux-x86-any            Linux, x86 32-bit (for truly ancient computers)
linux-x86-clang          Linux, x86 32-bit with SSE2, compiled with clang
linux-alpha              Linux, Alpha
linux-sparc              Linux, SPARC 32-bit
linux-ppc32-altivec      Linux, PowerPC w/AltiVec (best)
linux-ppc32              Linux, PowerPC 32-bit
linux-ppc64              Linux, PowerPC 64-bit
linux-ia64               Linux, IA-64