I've upgraded two of my nodes -- my old 4 core node with 8 GB ram now has 4x4=16 GB RAM, while my old 8 core, 16 GB ram now has 4*8=32 GB ram.
When using nwchem you eventually will run into an shmmax problem:
******************* ARMCI INFO ************************
The application attempted to allocate a shared memory segment of 44498944 bytes in size. This might be in addition to segments that were allocated succesfully previously. The current system configuration does not allow enough shared memory to be allocated to the application.
This is most often caused by:
1) system parameter SHMMAX (largest shared memory segment) being too small or
2) insufficient swap space.
Please ask your system administrator to verify if SHMMAX matches the amount of memory needed by your application and the system has sufficient amount of swap space. Most UNIX systems can be easily reconfigured to allow larger shared memory segments,
see http://www.emsl.pnl.gov/docs/global/support.html
In some cases, the problem might be caused by insufficient swap space.
*******************************************************
0:allocate: failed to create shared region : -1
(rank:0 hostname:boron pid:17222):ARMCI DASSERT fail. shmem.c:armci_allocate():1082 cond:0
I haven't gotten that in a while since I increased shmmax to 6572498432, but running a frequency calculation on a large molecule with unrestricted DFT triggered it again on my 32 GB node. So I hit google. These posts were informative:
http://www.pythian.com/news/245/the-mysterious-world-of-shmmax-and-shmall/
http://padmavyuha.blogspot.com.au/2010/12/configuring-shmmax-and-shmall-for.html
http://yuji.wordpress.com/2011/11/03/what-is-shmmax-shmall-shmmni-shared-memory-max/
me@neon:~$ cat /proc/sys/kernel/shmall 2097152 me@neon:~$ cat /proc/sys/kernel/shmni 4096 me@neon:~$ cat /proc/sys/kernel/shmmax 6572498432
That works out to (4096 bytes/page*2097152)*(1/(1024*1024*1024) bytes per gigabyte) pages=8.192 GB. And they are the same on all my nodes in spite of the memory available varying.
Another way of looking at it:
ipcs -lm ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 6418455 max total shared memory (kbytes) = 8388608 min seg size (bytes) = 1
Your shmmall is the number of pages total, the shmmni is the page size and the shmmax is the largest contigouos chunk of RAM available.
So if I get things right, and parroting what's said on the pages above, your shmmall should approach but not exceed your total physical memory, you shmni is better left alone, and your shmmax can be anywhere up to your total RAM.
The links above cite Oracle recommendations which state that (for 32 bit system) it should be 4 GB - 1 byte OR half your RAM, whichever is smaller. I'll show that case here, but will be testing using 80% of my RAM for my calcs.
So for my boxes:
32 GB RAM => shmmax=16GB, shmmall=(32-2 GB)/4095, shmni=4096
sudo sysctl -w kernel.shmmax=17179869184 sudo sysctl -w kernel.shmall=7340032 ipcs -lm16 GB RAM => shmmax=8GB, shmmall=(16-2 GB)/4096, shmni=4096------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 16777216 max total shared memory (kbytes) = 29360128 min seg size (bytes) = 1
sudo sysctl -w kernel.shmmax=8589934592 sudo sysctl -w kernel.shmall=3670016
If you're happy with those values, make them permanent by editing your sysctl.conf and adding the relevant lines:
kernel.shmmax=17179869184
kernel.shmall=7340032
So here are the formulae (assuming that you set shmmax to half your ram and leave 2 gb out of shmall):
shmmax=RAM (bytes)/2 shmni=4096 shmmall=(RAM(bytes)-2147483648)/shmni
No comments:
Post a Comment