24 October 2012

265. shmmax revisited -- and shmall, shmmni

I've upgraded two of my nodes -- my old 4 core node with 8 GB ram now has 4x4=16 GB RAM, while my old 8 core, 16 GB ram now has 4*8=32 GB ram.

When using nwchem you eventually will run into an shmmax problem:


******************* ARMCI INFO ************************
The application attempted to allocate a shared memory segment of 44498944 bytes in size. This might be in addition to segments that were allocated succesfully previously. The current system configuration does not allow enough shared memory to be allocated to the application.

This is most often caused by:
1) system parameter SHMMAX (largest shared memory segment) being too small or
2) insufficient swap space.
Please ask your system administrator to verify if SHMMAX matches the amount of memory needed by your application and the system has sufficient amount of swap space. Most UNIX systems can be easily reconfigured to allow larger shared memory segments,
see http://www.emsl.pnl.gov/docs/global/support.html
In some cases, the problem might be caused by insufficient swap space.
*******************************************************
0:allocate: failed to create shared region : -1
(rank:0 hostname:boron pid:17222):ARMCI DASSERT fail. shmem.c:armci_allocate():1082 cond:0

I haven't gotten that in a while since I increased shmmax to 6572498432, but running a frequency calculation on a large molecule with unrestricted DFT triggered it again on my 32 GB node. So I hit google. These posts were informative:
http://www.pythian.com/news/245/the-mysterious-world-of-shmmax-and-shmall/
http://padmavyuha.blogspot.com.au/2010/12/configuring-shmmax-and-shmall-for.html
http://yuji.wordpress.com/2011/11/03/what-is-shmmax-shmall-shmmni-shared-memory-max/


me@neon:~$  cat /proc/sys/kernel/shmall
2097152
me@neon:~$ cat /proc/sys/kernel/shmni
4096
me@neon:~$ cat /proc/sys/kernel/shmmax
6572498432

That works out to (4096 bytes/page*2097152)*(1/(1024*1024*1024) bytes per gigabyte) pages=8.192 GB. And they are the same on all my nodes in spite of the memory available varying.

Another way of looking at it:
ipcs -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 6418455
max total shared memory (kbytes) = 8388608
min seg size (bytes) = 1


Your shmmall is the number of pages total, the shmmni is the page size and the shmmax is the largest contigouos chunk of RAM available.

 So if I get things right, and parroting what's said on the pages above, your shmmall should approach but not exceed your total physical memory, you shmni is better left alone, and your shmmax can be anywhere up to your total RAM.

The links above cite Oracle recommendations which state that (for 32 bit system) it should be 4 GB - 1 byte OR half your RAM, whichever is smaller. I'll show that case here, but will be testing using 80% of my RAM for my calcs.

 So for my boxes:

32 GB RAM => shmmax=16GB, shmmall=(32-2 GB)/4095, shmni=4096
sudo sysctl -w kernel.shmmax=17179869184
sudo sysctl -w kernel.shmall=7340032
ipcs -lm

------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 16777216 max total shared memory (kbytes) = 29360128 min seg size (bytes) = 1
16 GB RAM => shmmax=8GB, shmmall=(16-2 GB)/4096, shmni=4096
sudo sysctl -w kernel.shmmax=8589934592
sudo sysctl -w kernel.shmall=3670016


If you're happy with those values, make them permanent by editing your sysctl.conf and adding the relevant lines:
kernel.shmmax=17179869184
kernel.shmall=7340032


So here are the formulae (assuming that you set shmmax to half your ram and leave 2 gb out of shmall):
shmmax=RAM (bytes)/2
shmni=4096
shmmall=(RAM(bytes)-2147483648)/shmni

23 October 2012

264. Upgrade to ECCE 6.4 on Debian Testing

There's little reason to upgrade from 6.3 to 6.4 since
Other than open source availability and bundling the latest NWChem 6.1.1, the ECCE 6.4 release is otherwise equivalent to the 6.3 release.  
But it's always nice with some new and shiny.

For general instructions on how to install ecce from scratch, see e.g. http://verahill.blogspot.com.au/2012/06/ecce-in-virtual-machine-step-by-step.html

Upgrading
Go here to download the latest release. You will be asked to supply your name, email address and the name of your institution. However, you no longer need to register.

Download the file install_ecce.v6.4.rhel5-gcc4.1.2-m64.csh (full binary + builder). And yes, you can use wget for that.
Stop the ecce server if it's running:
 ~/.ecce/ecce-6.3e/server/ecce-admin/stop_ecce_server 

Make your ecce install file executable and run it:
chmod +x install_ecce.v6.4.rhel5-gcc4.1.2-m64.csh
./install_ecce.v6.4.rhel5-gcc4.1.2-m64.csh

which launches the installation:
Extracting ECCE distribution from ./install_ecce.v6.4.rhel5-gcc4.1.2-m64.csh...

Main ECCE installation menu
===========================
1) Help on main menu options
2) Prerequisite software check
3) Full install
4) Full upgrade
5) Application software install
6) Application software upgrade
7) Server install
8) Server upgrade

IMPORTANT: If you are uncertain about any aspect of installing
or running ECCE at your site, please refer to the detailed
ECCE Installation and Administration Guide at 
http://ecce.pnl.gov/docs/installation/2864B-Installation.pdf

Hit  at prompts to accept the default value in brackets.

Selection: [1] 4
Host name: [beryllium] 
New application installation directory: [/home/me/tmp/ecce-v6.4/apps] /home/me/.ecce/ecce-v6.4/apps
Existing application directory to upgrade: /home/me/.ecce/ecce-v6.3e/apps

Backup existing server user data (yes/no)? [yes]

ECCE v6.4 will be installed using the settings:

  Installation type: [full upgrade]
  Host name: [beryllium]
  Application installation directory: [/home/me/.ecce/ecce-v6.4/apps]
  Application directory to upgrade: [/home/me/.ecce/ecce-6.3e/apps]
  Server installation directory: [/home/me/.ecce/ecce-v6.4/server]
  Server directory to upgrade: [/home/me/.ecce/ecce-6.3e/server]
  Backup existing server user data: [yes]

Are these choices correct (yes/no/quit)? [yes] 
Installing ECCE application software in /home/me/.ecce/ecce-v6.4/apps...
  Extracting application distribution...
  Extracting NWChem binary distribution...
  Extracting NWChem common distribution...
  Extracting client WebHelp distribution...
  Configuring application software...
  Configuring NWChem...

Installing ECCE server in /home/me/.ecce/ecce-v6.4/server...
  Extracting data server in /home/me/.ecce/ecce-v6.4/server/httpd...
  Extracting data libraries in /home/me/.ecce/ecce-v6.4/server/data...
  Extracting Java Messaging Server in /home/me/.ecce/ecce-v6.4/server/activemq...
  Configuring ECCE server...
  Copying user data from server to be upgraded...
  Copying share data from server to be upgraded...

ECCE installation succeeded.

***************************************************************
!! You MUST perform the following steps in order to use ECCE !!
-- Unless only the user 'me' will be running ECCE,
   start the ECCE server as 'me' with:
     /home/me/.ecce/ecce-v6.4/server/ecce-admin/start_ecce_server

-- To register machines to run computational codes, please see
   the installation and compute resource registration manuals
   at http://ecce.pnl.gov/using/installguide.shtml

-- Before running ECCE each user must source an environment
   setup script.  For csh/tcsh users add this to ~/.cshrc:
     if ( -e /home/me/.ecce/ecce-v6.4/apps/scripts/runtime_setup ) then
       source /home/me/.ecce/ecce-v6.4/apps/scripts/runtime_setup
     endif
   For sh/bash users, add this to ~/.profile or ~/.bashrc:
     if [ -e /home/me/.ecce/ecce-v6.4/apps/scripts/runtime_setup.sh ]; then
       . /home/me/.ecce/ecce-v6.4/apps/scripts/runtime_setup.sh
     fi
***************************************************************

And then
/home/me/.ecce/ecce-v6.4/server/ecce-admin/start_ecce_server
/home/me/.ecce/ecce-v6.4/server/httpd/bin/apachectl start: httpd started
[1] 25382
INFO  BrokerService         
- ActiveMQ 5.1.0 JMS Message Broker (localhost) is starting
INFO  BrokerService      
- ActiveMQ JMS Message Broker (localhost, ID:beryllium-46481-1350964505499-0:0) started

Put the following in your ~/.bashrc

export ECCE_HOME=/home/me/.ecce/ecce-v6.4/apps
export PATH=${ECCE_HOME}/scripts:${ECCE_HOME}/scripts/parsers:${PATH}$

And run:
source ~/.bashrc
ecce

22 October 2012

263. Cyanogen mod on Nexus One

Note that you need an unlocked and rooted Nexus One for this. I did this in the past and can barely remember how I did it. So don't ask me. Also, I'm using linux for this, so asking me about OS X or Windows would be doubly unwelcome.

Besides, I'm just following orders: http://wiki.cyanogenmod.com/wiki/Nexus_One:_Full_Update_Guide

0. Back up everything. Root and unlock your Nexus One.
I used Titanium Backup and SMS Backup and Restore to back things up, just in case. Then I connected my phone to my computer and copied everything. I unlocked the boot loader and rooted my phone quite a while ago and so can't remember how it's best done.

Looking at the traces in my system I used bexboot.v2.GRK39F_OTA and I don't remember that it was difficult. Just be aware that everything on your phone WILL BE WIPED. So back stuff up.

cd ~/tmp
wget http://bexboot.googlecode.com/files/bexboot.v2.GRK39F_OTA.zip
unzip bexboot.v2.GRK39F_OTA.zip
cd bexboot.v2.GRK39F_OTA/
chmod +x fastboot-linux

1. Download stuff

1a. Download the cyanogen mod image:
wget http://download.cyanogenmod.com/get/jenkins/2857/cm-7.2.0-passion.zip
md5sum cm-7.2.0-passion.zip 
0d37cc25fd42b0ad00f87c9e009b7a9c cm-7.2.0-passion.zip
1b. Get the Amon Ra recovery image:
wget http://cmw.22aaf3.com/passion/recovery/recovery-RA-passion-v2.2.1-CM.img 
md5sum recovery-RA-passion-v2.2.1-CM.img 
e8262ae23943ce50fd346001812fae79 recovery-RA-passion-v2.2.1-CM.img
1c. Then get the google apps:
wget http://cmw.22aaf3.com/gapps/gapps-gb-20110828-signed.zip
md5sum gapps-gb-20110828-signed.zip
1647897d8ac3efb04723d2ad2c361a3f gapps-gb-20110828-signed.zip

This is a good time to move the gapps-gb-20110828-signed.zip and cm-7.2.0-passion.zip files to the root of your SD card.

2. Edit your /etc/udev/rules.d/51-android.rules
 I changed mine from

SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", MODE="0666", GROUP="plugdev"
to

SUBSYSTEM=="usb", ATTR{idVendor}=="0bb4", MODE="0666", GROUP="plugdev"

since

Bus 001 Device 020: ID 0bb4:0fff HTC (High Tech Computer Corp.) Android Fastboot Bootloader

and did

sudo chmod a+rx /etc/udev/rules.d/51-android.rules
sudo service udev restart

3. Flash
Get Turn off your phone and plug in the usb cable. Hold down the scroll ball and, while holding it down, turn on your phone. Don't select anything. Instead, on your computer:
./fastboot-linux devices
HT015P801117 fastboot
So far so good!

./fastboot-linux flash recovery recovery-RA-passion-v2.2.1-CM.img 
sending 'recovery' (3380 KB)... OKAY
writing 'recovery'... OKAY
On your phone, select "bootloader" then "recovery", which starts the Android System Recovery  -- look at the bottom of the screen though, where it'll say "Build: RA-passion-v2.2.1"

Scroll (painfully -- it's really unresponsive so don't freak out) using the scroll ball on your phone to "Wipe", then push down the track ball to select it. Then select "Wipe ALL data/factory reset".

You'll then get asked:
Wipe ALL userdata
Press Trackball to confirm.
any other key to abort.
Confirm. You'll get the following messages:
Formatting DATA:...
Formatting SDCARD:.android_secure...
Formatting CACHE:...
Skipping format of /sd-ext.
Userdata wipe complete!
Now press Vol-Down to return to the previous menu, where you select "Flash zip from sdcard". You're now asked whether to choose zip from sdcard or to toggle signature verification. Choose "zip", and select "cm-7.2.0-passion.zip", which will launch the installation.

Once that's done, select "choose zip from sdcard" again and this time pick your gapps-gb-20110828-signed.zip. Once that's installed, hit Vol-Down to go up one level in the menu and select Reboot. You're now done.

In case of trouble:
My system complained here that the gapps file was 'bad', so I went up one menu level and rebooted. Without google apps life is less fun, but I didn't have an SD card reader at hand. Once my (flashy new) system was up I mounted the SD card via USB, and checked the md5sum, which was bad. I put a new copy of the file on the sd card, checked the md5sum (now good), and powered off the phone.
I then powered it on by holding down the trackball while pushing the power button, running "./fastboot-linux flash recovery recovery-RA-passion-v2.2.1-CM.img" on my computer, selecting bootload/recovery, then "flash zip from sdcard", "choose zip from sdcard", selecting "gapps-gb-20110828-signed.zip", and THIS TIME it went fine! Then just hit Vol-Down, select "Reboot system now" and you're done!

All in all, it took a while to prepare everything, but it wasn't as difficult or scary as one would be lead to believe.

The verdict:
I actually don't use my phone much these days, so I can't really tell how 'different' the cyanogen mod really is from my previous android install. But it looks a little bit different, and I seem to have a lot more control over the details, which is nice.