17 January 2012

49. Gromacs -- hangs on multicore when doing normal mode analysis

Symptom:
when doing
mdrun -s nm.tpr -mtx nm.mtx -v -deffnm nm
on a system with 637 atoms you end up with:
...Finish step 636 out of 637
and it hangs there with all cores running at 100%

Reason:
For some reason the normal mode analysis of at least this particular system won't run on multiple cores.

Solution:
Use an mpi compiled version of mdrun (see previous posts on compiling _dd, _mpi and _ddmpi versions of gromacs) and force the use of ONE core.

mpd --ncpus=4 &
mpdrun -n 1 mdrun_mpi -s nm.tpr -mtx nm.mtx -v -deffm nm

works!

Confirmation
This was confimed by running it on four computers:
64 bit: a six core AMD 64 using a compiled version of gromacs. Hangs.
64 bit: a four core intel i5 using both the debian version and a compiled version of gromacs. Hangs.
64 bit: an older four core intel using a compiled version of gromacs. Hangs.
32 bit: an old single-core laptop using the debian version of gromacs. Works.

Next, three single-core virtual machines were set up -- a stable 32 bit, a testing 32 bit and a testing 64 bit machine, all with the debian version of gromacs (sudo apt-get install gromac). They all worked, as they only had a single core.





12 January 2012

48. nvidia gt520 issues and solutions on debian testing

EDIT: see here for a Linux Mint Debian Edition take on the GT 520 nouveau issue -- http://community.linuxmint.com/tutorial/view/824

EDIT: Someone made a succinct how-to for nvidia driver installation on debian: http://blog.libremath.org/2012/04/07/debian-nvidia-quick-guide/ NOTE: site seems to be down.

--start here --
I recently bought an nvidia gt520 1 GB graphics card. To my surprise it turned out to be a bit of a pain to actually get it working properly.

Sadly, we don't always document all the steps when trying to get something to work, but here's roughly what I remember.

The problem:
I plugged the nvidia gt520 into the pci express slot, connected the vga cable to the vga socket on the new graphics card and started my computer. My setup autostarts gdm3. Everything went fine -- the boot messages were flashing by as per normal, then gdm3 started. And got stuck. I experience two different types of hanging  -- either just a black screen, or a black screen with a single cursor indicator (a single _ in the top left corner).

Logging in remotely (had ssh server running) and looking at top I could see that gdm3 was using up 50+% cpu power. Leaving the system for half an hour didn't allow for any progress.

Also, even when I did ctrl+alt+f1 to bring up tty1 I would be forcibly returned to tty7 over and over again. Trying to fix anything was thus difficult. After doing ctrl+alt+f1 a few times and being thrown around it would stop responding and strange symbols would appear on the screen when trying to use the keyboard.

One last piece of information: my onboard graphics is nvidia as well, but this probably isn't relevant.

Logging in remotely I tried using the excellent smxi / sgfxi scripts (http://smxi.org/) to install the proper graphics drivers. I tried nouveau, debian-nvidia and nvidia-current . I also tried just deleting /etc/X11/xorg.conf and hoping for the best

Diagnosis:
First I made sure gmd wasn't starting anymore so that the computer wouldn't hang and I'd be able to work in peace:
sudo vim /etc/default/grub

CMD_LINUX_DEFAULT="quiet splash"
was changed to
CMD_LINUX_DEFAULT="quiet splash text"
(there may be other things on the same line -- just add text)

Then to make the changes take effect,
sudo update-grub
and reboot

Second I tried unloading any modules

sudo rmmod nouveau
sudo rmmod nvidia

I edited /etc/modules and commented out nvidia, and made sure nouveau was there. I also edited etc/modprobe.d/nvidia-kernel-common.conf and commented out blacklist nouveau.

I then tried installing the nouveu driver a last time
sudo sgfxi -N nouveau
and rebooted
After the reboot had completed dmesg| grep nouv gave me the clue I needed -- the drivers had failed to load! I don't remember what the exact message was, but it was all about failure.


Solution:
(also see first post below)

I removed the xorg.conf
sudo rm /etc/X11/xorg.conf
then
startx
The desktop started! But I found myself in fallback mode -- the graphics acceleration obviously wasn't working -- but that wasn't a surprise since the drivers had failed to load.

I then ran
sudo rmmod nouveau
sudo apt-get install nvidia-kernel-dkms nvidia-settings nvidia-smi nvidia-xconfig
sudo nvidia-xconfig
startx

It worked!

My autogenerated /etc/modprobe.d/nvidia-kernel-common.conf now looks like this again:
alias char-major-195* nvidia

options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=44 NVreg_DeviceFileMode=0660
# To enable FastWrites and Sidebus addressing, uncomment these lines
# options nvidia NVreg_EnableAGPSBA=1
# options nvidia NVreg_EnableAGPFW=1

# see #580894
blacklist nouveau

Remember to remove any mention of nouveau in /etc/modules.

You can change your /etc/default/grub back to the way it was again to make gdm start again every time.

Edit: Reading between the lines it seems that Squeeze may not have the proper drivers available for GT520 -- binary installation using smxi might be a good idea in that case: http://forums.debian.net/viewtopic.php?f=17&t=72876

Lengthy output follows:

Here's dmesg | grep nvidia

###############################
[    7.192358] nvidia: module license 'NVIDIA' taints kernel.
[    7.278115] nvidia 0000:02:00.0: PCI INT A -> Link[LNED] -> GSI 18 (level, low) -> IRQ 18
[    7.278122] nvidia 0000:02:00.0: setting latency timer to 64
###############################


Here's lspci -vvnn



###############################


02:00.0 VGA compatible controller [0300]: nVidia Corporation GF119 [GeForce GT 520] [10de:1040] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Giga-byte Technology Device [1458:3520]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 18
Region 0: Memory at df000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at d0000000 (64-bit, prefetchable) [size=128M]
Region 3: Memory at dc000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at ec00 [size=128]
[virtual] Expansion ROM at def80000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel driver in use: nvidia

###############################


Here's lshw -C display (run as user)
###############################
WARNING: you should run this program as super-user.

  *-display            
       description: VGA compatible controller
       product: GF119 [GeForce GT 520]
       vendor: nVidia Corporation
       physical id: 0
       bus info: pci@0000:02:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:18 memory:df000000-dfffffff memory:d0000000-d7ffffff memory:dc000000-ddffffff ioport:ec00(size=128) memory:def80000-deffffff
WARNING: output may be incomplete or inaccurate, you should run this program as super-user.
###############################


And here's the xorg.conf:


###############################


# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 290.10  (pbuilder@cake)  Wed Nov 23 11:33:47 UTC 2011

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    SubSection     "Display"
        Depth       24
    EndSubS


###############################



Links to this page:
http://community.linuxmint.com/tutorial/view/824

11 January 2012

47. A step on the way to compiling the omnibook apci drivers on debian testing

First, see here:
http://sourceforge.net/projects/omnibook/
http://home.comcast.net/~rickrich1/toshiba-1115-s103/omnibook.txt

I have an old toshiba satellite a205 which has a fan that turns on at 50 degrees and turns off at 45 degrees. It's a much too narrow range, so the fan is starting up every two minutes or so -- having it turn off at 40 degrees would probably make more sense. To this effect I wanted to see if I could get apci fan support.

In the end I don't seem to have succeeded, but here's what I did manage to do, and what happened:

Go:
Install build-essential and the kernel headers for your kernel

git clone git://omnibook.git.sourceforge.net/gitroot/omnibook/omnibook
cd omnibook/

vim polling.c
comment out lines 128:
//        cancel_rearming_delayed_workqueue(omnibook_wq, &omnibook_poll_work);

and 191 using // :
//            cancel_rearming_delayed_workqueue(omnibook_wq, &omnibook_poll_work);

Run:
sudo make load

Read through doc/INSTALL, then


WARNING:
Some say that loading with the wrong ectype can be bad for you computer. My guess is that things will be fine as long as you don't put the computer on heavy load while trying the method below out so that you don't risk burning anything.

OK, time to try the shotgun approach:
var=1 && sudo rmmod omnibook && sudo modprobe omnibook ectype=$var && ls /proc/omnibook

Do this with values of var from 1-16. See which one gives the 'best' support. For me most things showed up for all ectypes between 1 and 10, but only ectype=1 show fan_policy

Next do cat /proc/omnibook/fan , cat /proc/omnibook/display, cat /proc/omnibook/battery etc. to see whether the settings seem to correspond to reality.

ectypes
I can't find the original document which details the different ectypes and the corresponding laptop models. Your best guess is to do like I did above (just trying randomly) or to google for omnibook and ectype and see which model is closer to yours.

Making omnibook load on boot:
vim /etc/modules
add a line saying
omnibook

and the create a file called omnibook.conf under /etc/modprobe.d
In /etc/modprobe.d/omnibook.conf you put a single line:
options modprobe ectype=12

cat /proc/omnibook/fan_policy gives

Fan off temperature:         0 C
Fan on temperature:          0 C
Fan level 2 temperature:     0 C
Fan level 3 temperature:     0 C
Fan level 4 temperature:     0 C
Fan level 5 temperature:     0 C
Fan level 6 temperature:    10 C
Fan level 7 temperature:    108 C
Minimal temperature to set: 25 C
Maximal temperature to set: 95 C

Those are the same values as in fan_policy.c (in the source code we downloaded). It seems that the way to change the values is that you should recompile, which is easy enough but also a bit scary. Haven't played with it yet.


Here's tree /proc/omnibook :

/proc/omnibook
├── ac
├── battery
├── blank
├── display
├── dmi
├── fan
├── fan_policy
├── hotkeys
├── lcd
├── temperature
├── touchpad
└── version

0 directories, 12 files



Here's the dmesg | grep omni output:

[    8.792966] omnibook: Driver version 2.20090707-trunk.
[    8.792969] omnibook: Forced load with EC type 1.
[    8.793055] omnibook: Feature range f86be5c0 - f86beac0
[    8.793058] omnibook: Testing feature ac at address f86be5c0
[    8.793060] omnibook: Begin table match of ac feature.
[    8.793063] omnibook: Attempting backend ec init.
[    8.793066] omnibook: Returning table entry nr 0.
[    8.793068] omnibook: Match succeeded: continuing with ac.
[    8.793072] omnibook: Testing feature battery at address f86be600
[    8.793075] omnibook: Begin table match of battery feature.
[    8.793077] omnibook: Attempting backend ec init.
[    8.793079] omnibook: Returning table entry nr 0.
[    8.793082] omnibook: Match succeeded: continuing with battery.
[    8.793086] omnibook: Testing feature blank at address f86be640
[    8.793088] omnibook: Begin table match of blank feature.
[    8.793090] omnibook: Attempting backend i8042 init.
[    8.793093] omnibook: Returning table entry nr 1.
[    8.793095] omnibook: Match succeeded: continuing with blank.
[    8.793098] omnibook: LCD backlight turn off at console blanking is enabled.
[    8.793102] omnibook: Testing feature bluetooth at address f86be680
[    8.793105] omnibook: Testing feature cooling at address f86be6c0
[    8.793107] omnibook: Testing feature display at address f86be700
[    8.793110] omnibook: Begin table match of display feature.
[    8.793112] omnibook: Attempting backend ec init.
[    8.793114] omnibook: Returning table entry nr 2.
[    8.793116] omnibook: Match succeeded: continuing with display.
[    8.795163] omnibook: Testing feature dock at address f86be740
[    8.795166] omnibook: Testing feature dump at address f86be780
[    8.795168] omnibook: Testing feature fan at address f86be7c0
[    8.795171] omnibook: Begin table match of fan feature.
[    8.795173] omnibook: Attempting backend ec init.
[    8.795176] omnibook: Returning table entry nr 0.
[    8.795178] omnibook: Match succeeded: continuing with fan.
[    8.795182] omnibook: Testing feature fan_policy at address f86be800
[    8.795184] omnibook: Begin table match of fan_policy feature.
[    8.795187] omnibook: Attempting backend ec init.
[    8.795189] omnibook: Returning table entry nr 0.
[    8.795191] omnibook: Match succeeded: continuing with fan_policy.
[    8.795195] omnibook: Testing feature hotkeys at address f86be840
[    8.795197] omnibook: Begin table match of hotkeys feature.
[    8.795200] omnibook: Attempting backend i8042 init.
[    8.795202] omnibook: Returning table entry nr 0.
[    8.795204] omnibook: Match succeeded: continuing with hotkeys.
[    8.795207] omnibook: Enabling all hotkeys.
[    8.799296] omnibook: Testing feature dmi at address f86be880
[    8.799300] omnibook: dmi feature has no backend table, io_op not initialized.
[    8.799304] omnibook: Testing feature version at address f86be8c0
[    8.799307] omnibook: version feature has no backend table, io_op not initialized.
[    8.799311] omnibook: Testing feature lcd at address f86be900
[    8.799314] omnibook: Begin table match of lcd feature.
[    8.799317] omnibook: Attempting backend ec init.
[    8.799319] omnibook: Returning table entry nr 2.
[    8.799322] omnibook: Match succeeded: continuing with lcd.
[    8.799326] omnibook: Testing feature muteled at address f86be940
[    8.799329] omnibook: Testing feature key_polling at address f86be980
[    8.799332] omnibook: Testing feature temperature at address f86be9c0
[    8.799334] omnibook: Begin table match of temperature feature.
[    8.799337] omnibook: Attempting backend ec init.
[    8.799339] omnibook: Returning table entry nr 0.
[    8.799341] omnibook: Match succeeded: continuing with temperature.
[    8.799347] omnibook: Testing feature touchpad at address f86bea00
[    8.799350] omnibook: Begin table match of touchpad feature.
[    8.799352] omnibook: Attempting backend i8042 init.
[    8.799355] omnibook: Returning table entry nr 0.
[    8.799357] omnibook: Match succeeded: continuing with touchpad.
[    8.799361] omnibook: Testing feature wifi at address f86bea40
[    8.799363] omnibook: Testing feature throttling at address f86bea80
[    8.799366] omnibook: Enabled features: ac battery blank display fan fan_policy hotkeys dmi version lcd temperature touchpad.