However, a while back I became careless and built a box ('Oxygen') where the motherboard didn't officially support the CPU. Swapping CPUs with another box seemed to solve the issues I had.
In the past couple of weeks I've begun to see some worrying signs that all isn't right. In particular I noticed the following in the dmesg output:
[693166.514897] [Hardware Error]: MC2 Error: VB Data ECC or parity error. [693166.514926] [Hardware Error]: Error Status: Corrected error, no action required. [693166.514934] [Hardware Error]: CPU:6 (15:1:2) MC2_STATUS[-|CE|MiscV|-|-|-|-|CECC]: 0x98414000010c0176 [693166.514955] [Hardware Error]: cache level: L2, tx: DATA, mem-tx: EV
A few days after that, the computer turned itself off without returning any additional error messages. It did cause me to look at the sensor output though (I've been logging it every two minutes for months), and I compared it with another computer ('Neon') which is completely stable. Note that both computers have been running the same types of jobs recently (large memory frequency jobs).
Oxygen: AMD FX8150, 32 gb ram, Corsair GS700, asrock 990 fx extreme3
Neon: AMD FX8350, 32 gb ram, Corsair GS800, gigabyte 990 fxa
Anyway, this is what I found:
|On Neon the power output is very stable, while on Oxygen it jumps up and down between ca 45 W and 130 W.|
Has it been a crappy UPS that has been causing the issues all along? Or do these plot mean nothing?