12 November 2015

625. In Progress: Dead node -- no power, unless 8-pin EDS/ATX12V1 unplugged

One of my nodes died last night. It's got a Corsair GS700, AsRock 990FX Extreme 3 mobo and an AMD FX 8150 cpu. The mobo and PSU are 2 years old, whereas the CPU is over three.

In it's full configuration, it refuses to turn on. There's no indication that there's any power. No lights, no fan movements, no beeps.

I checked the PSU, and it's fine.

I then discovered that if I only plug in the 24-pin motherboard PSU connector, but don't attach the 8-pin EDS CPU power, the computer behaves as if there's power i.e. all fans, including on PIC-E VGA card, spin etc.

There's no output though, and no POST beeps.

Plugging in the EDS/ATX12V1 cable again leads to a dead system.

Removing the CPU or RAM still yields a dead system (not sure what a mobo without a CPU is supposed to do i.e. whether it's supposed to POST. Either way, it's dead).

The CMOS battery tests fine with a DMM.  The EDS/ATX12V1 output yields 12 V on all four yellow pins.

The power button is obviously working given that the system 'turns on' when the EDS cable is unplugged.

So it's down to the MOBO or CPU being faulty.

I've ordered a POST card and a CPU socket tester, and will update when I've done further diagnostics.

UPDATE 18/11: I put an old Phenom II in the socket. System still won't light up. This points towards the motherboard being dead. That's not a bad thing as the mobo might still be under warranty.

15 October 2015

624. Gaussian fails with "traps: l502.exe[12449] general protection ip:d75df7 sp:7f7de40dcce0 error:0 in l502.exe[400000+dc8000]" on i7-5820K

Update: 
* The systems is rock solid with nwchem and ADF.  Only G09 crashes
* Gaussian has now released G09E, and the release notes say: "A Sandybridge/Haswell binary distribution is also available". Remains to be found out if this new version solves the issue. I can't check, as I don't have access to that version.

Update 18 Oct 2015:
TL;DR version: G09D (EM64T and AMD64) crash within the first 30 min to 4 hours. An NWChem job has so far run 6 days without crashing and is still going strong.

Original post:
This isn't much of a post yet. I'm mostly posting this so that people searching online will see that they aren't alone.


I just built a new node:
AU$559 Intel BX80648I75820K 6 Core i7-5820K 3.3Ghz 15MB LGA-2011-V3 (No Heatsink)
AU$407 Gigabyte X99-SLI Intel X99 S2011-3 8xDDR4/4xPCI-E/Intel GBLan/ATX Motherboard
AU$50 DeepCool FrostWin v2.0 CPU cooler
AU$155x2 Patriot 16G Kit (8Gx2) DDR4 2133 Desktop RAM
AU$185 Antec HCG-900 High Current Gamer Gaming PSU
AU$39 Gigabyte N210SL-1GI 1GB GT210 PCI-E VGA Card
AU$68 Seagate 3.5" Barracuda 1TB ST1000DM003 SATA3 7200RPM 64MB HDD (carbon)
AU$76 Antec GX500B-W Dominator Window USB3.0 Gaming Case without PSU

I've got an installation of up-to-date Jessie on it, with the following kernel: Debian 3.16.7-ckt11-1+deb8u5 (2015-10-09) x86_64 GNU/Linux.


When running G09D rev. 01 EM64T I keep getting random errors along these lines (these are collected over a couple of days and between restarts):
[100433.566789] traps: l703.exe[11236] general protection ip:df18ca sp:7fc96f595268 error:0 in l703.exe[400000+a46000] [ 2587.899019] traps: l703.exe[3727] general protection ip:9c9757 sp:7fa6fc436ce0 error:0 in l703.exe[400000+a46000] [26439.755347] traps: l502.exe[3235] general protection ip:ab8a55 sp:7ffe29504c10 error:0 in l502.exe[400000+dc8000] [43030.457126] traps: l502.exe[427] general protection ip:11565a7 sp:7f2ec1fff268 error:0 in l502.exe[400000+dc8000] [ 2587.899019] traps: l703.exe[3727] general protection ip:9c9757 sp:7fa6fc436ce0 error:0 in l703.exe[400000+a46000] [37460.207608] traps: l703.exe[14649] general protection ip:a38ae0 sp:7f1a813cf8c0 error:0 in l703.exe[400000+a46000] [ 8865.403861] traps: l502.exe[12449] general protection ip:d75df7 sp:7f7de40dcce0 error:0 in l502.exe[400000+dc8000]


Sometimes the crashes happen after 30 minutes, sometimes after 3 hours. Most happen within four hours. I seem to remember that one ran up to 12 hours, but nothing's gone beyond that. Some short (1h 30 min) calculations have managed to run to completion.

I've checked each RAM stick with memtest+ -- they are fine -- and they are distributed as recommended in the motherboard manual.

The temperature is running below 40 degrees Celsius.

The harddrive is fine according to SMART.

I log everything every two minutes, and so can go back and look at what happened right before the crash, but there's nothing odd.


My current best three hypotheses are:

* There's an issue with G09D EM64T and the new generation of LGA2011-v3 i7 intel cpus specifically

* There's an issue with any version of G09D and the new generation of LGA2011-v3 i7 intel cpus

* There's an issue with my system which is independent of G09D.

To test, I'll be:
* Do runs with G09D rev. 01 AMD64
                         Done -- this also crashed.

* Do runs with NWChem 6.5 (ifort, mkl)
                         Running -- 6 days so far without a crash!

* Update the BIOS (long shot)

* Remove CPU and check for bent pins (long, arduous shot)

I'll be posting updates...

28 August 2015

623. Comparing frequency calculations in G09 and NWChem -- the importance of grid density

The vibrational entropy term will differ between calculations done in gaussian and nwchem unless you use "grid xfine" in nwchem. That the grid density is important is nothing new, but the magnitude of the effect on the entropy surprised me.

The difference can be large -- 30 cal/molK for [PPh4]+ at pbe0/cc-pvdz -- which becomes quite significant when multiplied by T (e.g. 298.15 K).

Note that the rotational entropy term also may differ, but that this would be due to different uses of symmetry in the calculations: http://molecularmodelingbasics.blogspot.com.au/2012/12/conformational-and-rotational-entropy.html

If you turn off symmetry (noautosym) in nwchem the rotational entropy will not be corrected. I've noticed that Gaussian, on the other hand, will sneakily apply correction if it finds an acceptable symmetry even if you request nosymm, so make sure that you scan through the output carefully.

Either way, vibrational entropy is not symmetry dependent. Instead you will have to worry about the grid fine-ness when comparing outputs.

If your molecule is very small, such as benzene or tetramethylphosphonium, it seems that you don't have to worry about this. However, even fairly small molecules such as [PPh4]+ will be affected.

Conv. Dens. = Convergence Density


CodeSymmGridConv. Dens.DFT EnergyZPEHCorrS(tot)S(trans)S(rot)S(vib)
G09NF1E-8-1266.584241520.3705160.389751147.07643.35835.00968.708
G09NU1E-8-1266.584303740.3704640.389691146.82943.35835.00968.461
NWNX1E-8-1266.584552230.3703480.389549146.69743.33934.99468.365
NWNX1E-5-1266.584552220.3703480.389549146.70443.33934.99468.371
NWYF1E-5-1266.584536840.3700340.385683118.02343.33933.61741.067
NWYF1E-8-1266.584536840.3700340.385683118.02343.33933.61741.067
NWNF1E-5-1266.584549280.3700340.385683119.39443.33934.99441.062
NWNF1E-8-1266.584549290.3700340.385683119.39443.33934.99441.061
NWYX1E-5-1266.584552740.3703480.389549145.33143.33933.61768.376
NWYX1E-8-1266.584552750.3703480.389549145.33743.33933.61768.382

F=fine. X=Extrafine. U=Ultrafine.
S(rot) values in blue are symmetry corrected. That's completely normal.

With "grid fine" NWChem gives a very different result to G09.

You can see the difference in the predicted IR spectra as well.

Fine (NWChem) (blue rings) vs G09 (red circles):
"grid xfine" (NWChem) (blue rings) vs G09 (red circles):