13 September 2012

234. CPMD with netlib lapack, blas and your own fftw on debian testing

This is a minor update to my previous post on CPMD. Back in the days I had issue linking to my Openblas libs (got a binary which would not run properly) but I've since had success with the netlib lapack and blas libs.

1. Compile the netlib lapack and blas libraries according to this post: http://verahill.blogspot.com.au/2012/09/compiling-netlibs-lapack-and-blas-on.html

2. Compile the fftw libraries according to this post (ignore the sections on Openblas and Gromacs):
http://verahill.blogspot.com.au/2012/05/gromacs-with-external-fftw3-and-blas-on.html

3. Compile CPMD. We'll be following this post in large parts.
Register with cpmd.org. Once you're approved download the cpmd source to ~/tmp.

sudo apt-get install libopenmpi-dev openmpi-bin

cd ~/tmp
tar -xvf cpmd-v3_15_3.tar.gz
cd CPMD/CONFIGURE
Create the file LINUX-x86_64-DEBIAN:
   
     IRAT=2
     CFLAGS='-c -O2 -Wall'
     CPP='/lib/cpp -P -C -traditional'
     CPPFLAGS='-D__Linux -D__PGI -D__GNU -DFFT_FFTW3 -DPARALLEL -DPOINTER8'
     FFLAGS='-c -O2 -fcray-pointer -fno-whole-file -fsecond-underscore'
     LFLAGS='-l:/opt/fftw/fftw-3.3.2/double/lib/libfftw3.a -l:/opt/fftw/fftw-3.3.2/double/lib/libfftw3_mpi.a -l:/opt/fftw/fftw-3.3.2/double/lib/libfftw3_threads.a -I/usr/include -l:/opt/netlib/blas/lib/libnetblas.so -l:/opt/netlib/lapack/lib/liblapack.so -lpthread -lmpi'
     FFLAGS_GROMOS='  $(FFLAGS)' 
      FC='mpif77 -fbounds-check'
      CC='mpicc'
      LD='mpif77 -fbounds-check'

Next edit ~/tmp/CPMD/wfnio.F and change the following lines:
 15       CHARACTER(len=*) TAG
 63         IF(TAG(1:2).EQ.'NI') THEN
201       IF(TAG(1:2).NE.'NI') THEN
271         IF(TAG(1:2).EQ.'NI') THEN

Now, in ~/tmp/CPMD, run
./mkconfig.sh LINUX-x86_64-DEBIAN > Makefile
make
sudo mkdir /opt/cpmd
sudo chown $USER /opt/cpmd
cp cpmd.x /opt/cpmd


And follow everything below 'Done! Almost.' in this post: http://verahill.blogspot.com.au/2012/07/not-solved-compiling-cpmd-on-debian.html


12 September 2012

233. Compiling netlib's lapack and blas on Debian Testing (Wheezy)

In addition to specific BLAS/LAPACK libs such as ACML, MKL, and ATLAS netlib provides (what I understand to be) reference versions of BLAS and LAPACK. Presumably these are slower than optimised versions of blas/lapack, but it doesn't hurt being familiar with them.

Here's how to compile those versions.



BLAS
sudo mkdir /opt/netlib
sudo chown $USER /opt/netlib
mkdir /opt/netlib/blas/lib -p
wget http://www.netlib.org/blas/blas.tgz
tar xvf blas.tgz
cd BLAS/

Edit make.inc
OPTS = -O3 -shared -m64 -march=native -fPIC


make all
gfortran -shared -Wl,-soname,libnetblas.so -o libblas.so.1.0.1 *.o -lc
ln -s libblas.so.1.0.1 libnetblas.so
cp lib*blas* /opt/netlib/blas/lib


To see whether everything linked ok:
 ldd libnetblas.so 
        linux-vdso.so.1 =>  (0x00007ffff1bc6000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002b42ec030000)
        libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002b42ec3b8000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002b42ec6ce000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002b42ec950000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002b42ecb67000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b42ebaf3000)




LAPACK
(inspired by this and this)
mkdir -p /opt/netlib/lapack
sudo apt-get install cmake-curses-gui
cd ~/tmp
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
tar xvf lapack-3.4.1.tgz
cd lapack-3.4.1/
mkdir build
cd build
ccmake ../

Hit 'c' to generate a configuration. Navigate with arrow keys and hit enter to change values. Change to the values in red:
 
 BUILD_COMPLEX                   *ON
 BUILD_COMPLEX16                 *ON
 BUILD_DOUBLE                    *ON
 BUILD_SHARED_LIBS               *ON
 BUILD_SINGLE                    *ON
 BUILD_STATIC_LIBS               *ON
 BUILD_TESTING                   *ON
 CMAKE_BUILD_TYPE                *     
 CMAKE_INSTALL_PREFIX            */opt/netlib/lapack
 LAPACKE                         *OFF
 LAPACKE_WITH_TMG                *OFF
 USE_OPTIMIZED_BLAS              *ON
 USE_XBLAS                       *OFF

Then hit 'c' which might give you (change the values in red) -- I got some errors about ACML/eula here, but don't worry about that.

NOTE: this will only work if you already have blas installed in a standard location. If you don't get the BLAS_FOUND etc. then you should hit 'c' again and then 'g'. Next edit your CMakeCache.txt and paste the variables (without line numbers) you find below this section, then do ccmake ../ again and make sure everything looks ok, and generate using 'g'.

 BLAS_FOUND                       TRUE
 BLAS_GENERIC_FOUND               ON
 BLAS_GENERIC_blas_LIBRARY        /opt/netlib/blas/lib/libnetblas.so
 BLAS_LIBRARIES                   /opt/netlib/blas/lib/libnetblas.so
 BLAS_LINKER_FLAGS
 BUILD_COMPLEX                   *ON
 BUILD_COMPLEX16                 *ON
 BUILD_DOUBLE                    *ON
 BUILD_SHARED_LIBS               *OFF
 BUILD_SINGLE                    *ON
 BUILD_STATIC_LIBS               *ON
 BUILD_TESTING                   *ON
 CMAKE_BUILD_TYPE                *     
 CMAKE_INSTALL_PREFIX            */usr/local 
 LAPACKE                         *OFF
 LAPACKE_WITH_TMG                *OFF
 USE_OPTIMIZED_BLAS              *ON
 USE_XBLAS                       *OFF
The hit 'c' again. If there were no issues, hit 'g' which writes the configuration and exits.


make
[100%] Building Fortran object TESTING/EIG/CMakeFiles/xeigtstz.dir/__/__/INSTALL/dsecnd_INT_ETIME.f.o
Linking Fortran executable ../../bin/xeigtstz
[100%] Built target xeigtstz

make install
Install the project...
-- Install configuration: ""
-- Installing: /opt/netlib/lapack/lib/pkgconfig/lapack.pc
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-config.cmake
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-config-version.cmake
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-targets.cmake
-- Installing: /opt/netlib/lapack/lib/cmake/lapack-3.4.1/lapack-targets-noconfig.cmake
-- Installing: /opt/netlib/lapack/lib/liblapack.so
-- Removed runtime path from "/opt/netlib/lapack/lib/liblapack.so"
-- Installing: /opt/netlib/lapack/lib/libtmglib.so
-- Removed runtime path from "/opt/netlib/lapack/lib/libtmglib.so"


tree /opt/netlib/ -d
/opt/netlib/
|-- blas
|   `-- lib
`-- lapack
    `-- lib
        |-- cmake
        |   `-- lapack-3.4.1
        `-- pkgconfig

7 directories

CMakeCache.txt variables:
 16 
 17 BLAS_FOUND:STRING=TRUE
 18 
 19 //Whether not the GENERIC library was found and is usable
 20 BLAS_GENERIC_FOUND:BOOL=TRUE
 21 
 22 //Path to a library.
 23 BLAS_GENERIC_blas_LIBRARY:FILEPATH=/opt/netlib/blas/lib/libnetblas.so
 24 
 25 BLAS_LIBRARIES:PATH=/opt/netlib/blas/lib/libnetblas.so
 26 


Testing the libraries:
I built gromacs against the new libs to make sure they 'worked'

sudo mkdir /opt/gromacs
sudo chown ${USER} /opt/gromacs
cd ~/tmp
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-4.5.5.tar.gz
tar xvf gromacs-4.5.5.tar.gz
cd gromacs-4.5.5/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/netlib/blas/lib:/opt/netlib/lapack/lib
export LDFLAGS="-l:/opt/netlib/blas/lib/libnetblas.so -l:/opt/netlib/lapack/lib/liblapack.so"
./configure --disable-mpi --enable-float --with-external-blas --with-external-lapack --program-suffix=_netlib --prefix=/opt/gromacs/gromacs-4.5.5
make
make install

Check that it linked ok:
ldd /opt/gromacs/gromacs-4.5.5/bin/grompp_netlib
        linux-vdso.so.1 =>  (0x00007fffb83f2000)
        libgmxpreprocess.so.6 => /opt/gromacs/gromacs-4.5.5/lib/libgmxpreprocess.so.6 (0x00002b6411cfa000)
        libmd.so.6 => /opt/gromacs/gromacs-4.5.5/lib/libmd.so.6 (0x00002b6411fcd000)
        libfftw3f.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3f.so.3 (0x00002b64123ad000)
        libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00002b64127b0000)
        libgmx.so.6 => /opt/gromacs/gromacs-4.5.5/lib/libgmx.so.6 (0x00002b6412b10000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00002b6412fe5000)
        libnetblas.so => /opt/netlib/blas/lib/libnetblas.so (0x00002b64131e9000)
        liblapack.so => /opt/netlib/lapack/lib/liblapack.so (0x00002b64134cc000)
        libnsl.so.1 => /lib/x86_64-linux-gnu/libnsl.so.1 (0x00002b6413ece000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002b64140e6000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00002b6414369000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002b6414585000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00002b641490c000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00002b6414b24000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b6411ad8000)
        libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002b6414d47000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002b641505d000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002b6415274000)

Here are some input files (it's not a 'real' md run -- I just needed something small and quick to run):
step1.top:
#include "/opt/gromacs/gromacs-4.5.5/share/gromacs/top/ffoplsaa.itp"
#include "/opt/gromacs/gromacs-4.5.5/share/gromacs/top/oplsaa.ff/tip4p.itp"

[system]
test 

[molecules]

step1.mdp:

integrator = md
define      = -DFLEXIBLE
emtol      = 1000.0
emstep     = 0.001
nsteps     = 5000
nstlist    = 1
ns_type    = grid 
rlist      = 0.9
coulombtype= PME  
rcoulomb   = 0.9  
rvdw       = 1.0  
pbc        =  xyz

genbox_netlib -o step1.gro -cs /opt/gromacs/gromacs-4.5.5/share/gromacs/top/tip4p.gro -box 4x4x4 -p step1.top

grompp_netlib -f step1.mdp -po step2.mdp -p step1.top -pp step2.top -c step1.gro -o step2.tpr

mdrun_netlib -v -s step2.tpr -o step3.trr -x step3.xtc -cpo step3.cpt -c step3.gro -e step3.edr -g step3.log

On my old AMD II X3 I got about 7.7 GFLOPS with Openblas and 7.8 GFLOPS with the above libs. Note that the run is shorter than a minute so it's pretty useless for benchmarking. However, there's no obvious MAJOR penalty.


If you don't have cmake:
cp INSTALL/make.inc.gfortran make.inc

Edit make.inc
 15 FORTRAN  = gfortran
 16 OPTS     = -O2 -fPIC -m64
 17 DRVOPTS  = $(OPTS)
 18 NOOPT    = -O0 -fPIC -m64
 19 LOADER   = gfortran
 20 LOADOPTS =
Edit Makefile
 11 #lib: lapacklib tmglib
 12 lib: blaslib variants lapacklib tmglib
Run make
make
-->  Tests passed: 13176


   -->   LAPACK TESTING SUMMARY  <--
  Processing LAPACK Testing output found in the TESTING direcory
SUMMARY              nb test run  numerical error    other error  
================    =========== ================= ================  
REAL              1077227  0 (0.000%) 0 (0.000%) 
DOUBLE PRECISION 1078039  0 (0.000%) 0 (0.000%) 
COMPLEX           522814  0 (0.000%) 0 (0.000%) 
COMPLEX16          552410  0 (0.000%) 0 (0.000%) 

--> ALL PRECISIONS 3230490  0 (0.000%) 0 (0.000%) 




Older version:
In the oldest version of this post I did the blas compilation by hand:

gfortran -O2 -fPIC -m64 -march=native -funroll-all-loops -c *.f

To build a static library:
ar rvs libblas.a *.o

To build a shared/dynamic library:
gfortran -shared -Wl,-soname,libnetblas.so -o libblas.so.1.0.1 *.o -lc 


ldd libblas.so.1.0.1
        linux-vdso.so.1 =>  (0x00007fff301af000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002aeeac390000)
        libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002aeeac718000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002aeeaca2e000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002aeeaccb0000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002aeeacec7000)
        /lib64/ld-linux-x86-64.so.2 (0x00002aeeabedd000)

Either way:
cp libblas* /opt/netlib/blas/lib

To test:
wget http://www.netlib.org/blas/sblat1
mv sblat1 sblat1.f

And EITHER
gfortran sblat1.f -l:libblas.a

OR
ln -s libblas.so.1.0.1 libnetblas.so
gfortran sblat1.f -l:libnetblas.so

THEN
./a.out
 
Real BLAS Test Program Results
Test of subprogram number  1             SDOT 
                                    ----- PASS -----

 Test of subprogram number  2            SAXPY 
                                    ----- PASS -----

 Test of subprogram number  3            SROTG 
                                    ----- PASS -----

 Test of subprogram number  4             SROT 
                                    ----- PASS -----

 Test of subprogram number  5            SCOPY 
                                    ----- PASS -----

 Test of subprogram number  6            SSWAP 
                                    ----- PASS -----

 Test of subprogram number  7            SNRM2 
                                    ----- PASS -----

 Test of subprogram number  8            SASUM 
                                    ----- PASS -----

 Test of subprogram number  9            SSCAL 
                                    ----- PASS -----

 Test of subprogram number 10            ISAMAX
                                    ----- PASS -----

11 September 2012

232. Compile parallel (threaded) povray 3.7-rc6 on Debian Wheezy

Update 13 May 2013: This build won't work with v3.7-rc7 on debian wheezy if you have libjpeg62 installed. See http://verahill.blogspot.com.au/2013/05/413-povray-37-rc7-on-debian-wheezy.html.

Remove libjpeg62 and it works fine though.

Original post
Expanding my little cluster has got me thinking about additional uses for it. The primary purpose is obviously work i.e. MD simulations using gromacs and ab initio calcs using NWChem and Gaussian. I'm also testing it with John the Ripper to see how well the users of the linux box in the lab are choosing their passwords.

At that point I realised that it'd be sweet to have at least an OMP capable version of povray to speed things up when polishing figures for those elusive journal covers.

Debian testing currently uses v. 3.6.1 of povray but

  POV-Ray 3.6 does not support multithreaded rendering. POV-Ray 3.7 does.

So compile we will although v 3.7 is beta, so be aware.
sudo mkdir /opt/povray
sudo chown $USER /opt/povray

wget http://povray.org/redirect/www.povray.org/beta/source/povray-3.7.0.RC6.tar.gz
tar xvf povray-3.7.0.RC6.tar.gz
cd povray-3.7.0.RC6/
sudo apt-get install libboost-all-dev libpng-dev libjpeg-dev libtiff-dev build-essential libsdl-dev

Note: libboost-all-dev is big. It might be enough with libboost-thread-dev

./configure --prefix=/opt/povray --program-suffix=_3.7 COMPILED_BY="me@here"
===============================================================================
POV-Ray 3.7.0.RC5 has been configured.

Built-in features:
  I/O restrictions:          enabled
  X Window display:          disabled
  Supported image formats:   gif tga iff ppm pgm hdr png jpeg tiff
  Unsupported image formats: openexr

Compilation settings:
  Build architecture:  x86_64-unknown-linux-gnu
  Built/Optimized for: x86_64-unknown-linux-gnu (using -march=native)
  Compiler vendor:     gnu
  Compiler version:    g++ 4.7
  Compiler flags:      -pipe -Wno-multichar -Wno-write-strings -fno-enforce-eh-specs -s -O3 -ffast-math -march=native -pthread

Type 'make check' to build the program and run a test render.
Type 'make install' to install POV-Ray on your system.

The POV-Ray components will be installed in the following directories:
  Program (executable):       /opt/povray/bin
  System configuration files: /opt/povray/etc/povray/3.7
  User configuration files:   $HOME/.povray/3.7
  Standard include files:     /opt/povray/share/povray-3.7/include
  Standard INI files:         /opt/povray/share/povray-3.7/ini
  Standard demo scene files:  /opt/povray/share/povray-3.7/scenes
  Documentation (text, HTML): /opt/povray/share/doc/povray-3.7
  Unix man page:              /opt/povray/share/man
===============================================================================

The way it is configured we can keep our debian version of povray and install the newer version (povray_3.7)

make
make install

Seems like -geometry 1000x1000 doesn't work anymore. Instead use -H1000 -W1000

I've played around with it a little bit and it does parallel (threaded) execution nicely.

wget http://www.ms.uky.edu/~lee/visual05/povray/fourcube7.pov
./povray_3.7 -H1000 -W1000 fourcube7.pov +A0.1
takes 9 seconds on an AMD II X3. The standard, serial Debian version takes 21 seconds.