My cluster looks like this:
I've got four nodes which are connected via two networks, 192.168.2.0/24 and 192.168.1.0/24. The 192.168.1.0/24 network is connected using a gigabit switch. Be (see below) acts as the gateway. The 192.168.2.0/24 network is connected via a crappy old netgear 10/100 router (dhcp) and provides access to the outside world (hello mac spoofing :) ). Each box shares a folder via nfs using a unique name.
_Nodes_
Be: AMD II X3, 8 GB ram (192.168.1.1): Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
Ta: Intel i5-2400, 8 GB ram (192.168.1.150): Intel Corporation 82579LM Gigabit Network Connection (rev 04)
B: AMD Phenom II X6, 8 GB ram (192.168.1.101): Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
Ne: AMD FX 8150 X8, 16 GB ram (192.168.1.120): Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
So, time to test the network performance:
sudo apt-get install iperf
On all your boxes (e.g. using clusterssh) start the iperf daemon
iperf -s
Then on each of your nodes run:
iperf -c 192.168.1.1 && iperf -c 192.168.1.101 && iperf -c 192.168.1.150 && iperf -c 192.168.1.120
------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 45.7 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.101 port 37893 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 564 MBytes 473 Mbits/sec ------------------------------------------------------------ Client connecting to 192.168.1.101, TCP port 5001 TCP window size: 169 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.101 port 35926 connected with 192.168.1.101 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 15.5 GBytes 13.3 Gbits/sec ------------------------------------------------------------ Client connecting to 192.168.1.150, TCP port 5001 TCP window size: 22.9 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.101 port 48257 connected with 192.168.1.150 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 564 MBytes 473 Mbits/sec ------------------------------------------------------------ Client connecting to 192.168.1.120, TCP port 5001 TCP window size: 22.9 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.101 port 43236 connected with 192.168.1.120 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 617 MBytes 517 Mbits/sec
Overall, this is what I got
Client/Server (MBit/s)
Be B Ta Ne Be 13.7G 310 308 316 B 564 15.5G 564 617 Ta 726 660 19.7G 936 Ne 882 484 917 19.4G
I'm not sure whether to expect a metric gigabit (1000 metric MBit) or a binary one (1024 binary MBit), but looking at our results our best is 936 Mbit/s and worst 308 Mbit/s. All of them should thus ideally reach at least 936 MBit/s. They all have gigabit network card.
And now, try to improve it:
I went through the whole shebang with
sudo ifconfig eth1 mtu 9000
sudo ifconfig eth1 mtu 8000
etc.
Anyway, I got the following MTUs that way:
Be 7100
B 7100
Ne 9000
Ta 9000
I then set the MTUs to 7100 on all the nodes and tried pinging from node to node, e.g.:
ping -s 7072 -M do 192.168.1.101
Well, that maxed out at 1472 i.e. about MTU 1500 which was the original value. So I'm a bit confused.
Settings:
Be:
eth1 Link encap:Ethernet HWaddr 00:f0:4d:83:0a:48 inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::2f0:4dff:fe83:a48/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:24124966 errors:0 dropped:27064 overruns:0 frame:0 TX packets:19569426 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:25859945667 (24.0 GiB) TX bytes:14200267703 (13.2 GiB)B:
eth1 Link encap:Ethernet HWaddr 02:00:8c:50:2f:6b inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::8cff:fe50:2f6b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:14540970 errors:0 dropped:36651 overruns:0 frame:0 TX packets:16801915 errors:0 dropped:2 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:12347398135 (11.4 GiB) TX bytes:18008416370 (16.7 GiB)Ta:
eth1 Link encap:Ethernet HWaddr 78:2b:cb:b3:a4:b7 inet addr:192.168.1.150 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::7a2b:cbff:feb3:a4b7/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:14717233 errors:0 dropped:68232 overruns:0 frame:0 TX packets:17769966 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:13860096243 (12.9 GiB) TX bytes:20207270880 (18.8 GiB) Interrupt:20 Memory:e1a00000-e1a20000Ne:
eth1 Link encap:Ethernet HWaddr 90:2b:34:93:75:e6 inet addr:192.168.1.120 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::922b:34ff:fe93:75e6/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:13567520 errors:0 dropped:0 overruns:0 frame:0 TX packets:10710054 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:13086635236 (12.1 GiB) TX bytes:12381041605 (11.5 GiB)