See here for the troubleshooting thread:
http://verahill.blogspot.com.au/2013/10/523-random-reboots-troubleshooting-in.html
Also see this thread: http://www.techpowerup.com/forums/showthread.php?t=184061
I'll need to read up on...stuff...but the bottom line seems to be that one would expect issues with this board/cpu combo:
andStill only a 4+1 phase board the FX chips pull a bit more power than that can put out comfortably and stable. [..] Those would be your three best to choose from all are the better 8+2 phase designs...
my opinion is to stay away from the asus FX ive seen many people asking why their boards are throttling at full load, vrm protection causes voltages to drop at full load when vrms hit a certain temp.
and it seemed that low (CPU) voltages precipitated crashes.
Original post:
So I built a new node at the beginning of October 2013, using the following parts:
- AMD FX 8350 CPU
- 4*8 Gb GSkill RAM
- ASRock 990FX Extreme3 motherboard
- 1 Tb Seagate Barracuda HDD
- MSI N210 graphics card
- ASUS NX1101 Gigabit NIC
- Corsair GS700 PSU
- Antec GX700 case
See here for my troubleshooting thread: http://verahill.blogspot.com.au/2013/10/523-random-reboots-troubleshooting-in.html
So the main value of this post are the photos which show how easy it is to build a computer. Just don't...well...build this particular computer -- use a different motherboard.
The other value of the post is purely personal -- I just wanted to write down the steps to take whenever installing a new node in my little cluster.
There will be a post later on troubleshooting (and hopefully fixing) the issue of the spontaneous reboots.
*I've built a number of computers for myself as well as for other people and haven't had any issues (other than bad RAM) before. I got lazy this time and am paying for it.
The first step was assembly:
I like the case -- it's metal and feels robust. Having two fans on top is a definite plus as well as it works well with my home-built rack.
Note that the case doesn't come with a printed manual -- to get the manual you need to go online. And it still falls short -- there's no guide as to how to use the many, many cables it comes with. However, it's not rocket science either. Turns out that the case has a molex plug for powering the case fans. So plug in the molex plug to the PSU, then plug in the fans to the four weird plug/cables that the case comes with. Note that the mobo has no plug for the internal connector USB 3 cable that the case comes with.
See here for more details re the case: http://www.hardocp.com/article/2013/09/12/antec_gx700_atx_computer_case_review
Case, closed |
The glorious innards of The Case. |
The 700GX does not come with a mobo panel -- not that they tend to be useful anyway |
Luckily all (most?) mobos come with their own panels -- push it in place before doing anything else. It can need a bit of negotiation in order to snap in properly. |
The case came with riser nuts, four PSU screws and lots of screws for the mobo. |
Put the riser nuts in the case -- they are the golden thingies |
The heatsink (left) and CPU (right) |
The heatsink comes with thermal paste pre-applied. Don't touch it -- you want it to be as smooth and even as possible. |
Get the CPU out |
Note the yellow triangle in the bottom right corner in the picture |
That should match up with the triangle in the bottom left of this picture. Note the raised level on the right side of the CPU socket. |
The CPU in place. Note the raised lever. There should be no pushing -- the CPU should fit perfectly without any force whatsoever. If you bend a pin...then good luck. |
The lever is in the locked position. |
Everything is locked down. |
All four RAM sticks in place, and the motherboard attached to the case via seven screws that screw into the riser nuts. |
The PSU is in place. |
Main power and auxiliary power cables attached. |
This particular case has a special tray for the hard drives. |
Hard drive in place |
SATA data and power cables attached |
The other end of the SATA data cable attaches to the motherboard (SATA 1) |
After a bit of rewiring. |
PCI NIC and PCI-E graphics cards in place. |
It's questionable whether one can really call it a cluster though since I run each job on a single node for performance reasons. It still attracts attention from visitors to my office though.
Software:
I then installed debian wheezy on it. During the installation I was notified that I might want to consider enabling non-free to get the r8169 and tg3 firmwares
So after enabling non-free in the sources I did:
sudo apt-get install firmware-realtek firmware firmware-linux-nonfree
Didn't seem to change anything though -- everything was working fine before too.
I also installed amd64-microcode which, if I understand things correctly, should obviate the need for some of the full BIOS updates.
Other little housekeeping things:
I first sorted out
INIT: Id "co" respawning too fast: disabled for 5 minutes
as shown here: http://verahill.blogspot.com.au/2012/01/debian-testing-64-wheezy-small-fixes.html
I then installed a few basic thing:
sudo apt-get install vim screen sinfo gawk lm-sensors
and made a ~/.vimrc:
set number set pastetoggle=<f3> nnoremap <f4> :set nonumber!<CR>
And set vim to the default editor in lieu of nano:
sudo update-alternatives --config editor
I edited /etc/default/sinfo to make it use the correct network:
I set up 'static' dhcp on the WAN router.OPTS="${OPTS} --quiet --bcastaddress=192.168.1.255"
On the node, I then sorted out /etc/network/interfaces to use dhcp on eth1 and 192.168.1.180 on eth0, and to route everything properly (i.e. local traffic over eth0, and everything else over eth1):
auto lo iface lo inet loopback auto eth1 iface eth1 inet dhcp auto eth0 iface eth0 inet static address 192.168.1.180 gateway 192.168.1.1 netmask 255.255.255.0 post-up ip route flush all post-up route add default eth1 post-up route add -net 192.168.1.0 netmask 255.255.255.0 gw 192.168.1.1 eth0
SGE won't work properly unless you edit /etc/hosts:
127.0.0.1 localhost #127.0.1.1 oxygen 192.168.1.180 oxygen
The way my cluster works is that every node has its own shared folder.
mkdir ~/oxygen mkdir ~/scratch chmod 777 ~/oxygen
Export it as shown here: http://verahill.blogspot.com.au/2012/02/debian-testing-wheezy-64-sharing-folder.html
Set up ssh key login in both directions:
ssh-keygen vim ~/.ssh/authorized_keys
Then add the new node to the cluster: http://verahill.blogspot.com.au/2013/08/501-briefly-adding-new-node-to-sge.html
Build nwchem as shown here: http://verahill.blogspot.com.au/2013/05/424-nwchem-63-on-debian-wheezy.html
Set up gaussian as shown here: http://verahill.blogspot.com.au/2012/05/settiing-up-gaussian-g09-on-debian.html
Fix shmem: http://verahill.blogspot.com.au/2012/10/shmmax-revisited-and-shmall-shmmni.html
Finally, to address this issue regarding corrupt packages during SSH sessions I then added to /etc/rc.local: /sbin/ethtool -K eth1 rx off tx off