29 April 2013

401. AMD FX 8150:issues building kernel -- random failures.

Update 4:  I found the receipts for one pair of sticks and took it to MSY in Melbourne -- they were replaced on the spot without any questions asked. Very happy.

Update 3: The errors were all due to 3 bad ram sticks. Using the only good stick everything works fine. That's 24 Gb of bad ram...this won't be cheap if I can't find the receipts...

Update 2:
Running memtest86 I caught lots of errors (51 in 50 minutes) before I killed the test. I'm currently testing each stick one by one. I'm hoping that what is seemingly RAM errors can be caused by inapproriate BIOS settings, because 32 Gb bios is not cheap to replace...

While I'm swapping RAM sticks I'm also testing a separate set of stick on a different box. If they are error free it will be interesting to see if they trigger errors on the troublesome node. I'm still hoping for BIOS as being the culprit...

So far three out of four tested sticks have shown errors -- they all happen during test #6. The fourth stick has passed all tests seven times.

Update 1: dmesg also shows the same message as the OP here sees: https://bugzilla.redhat.com/show_bug.cgi?id=909702

The OP puts it down to a misconfigured bios, so the quest continues.
Searching for 990FX and FX8150 I get a number of hits:

Here's a newegg review for 990FX:

 I purchased This MB to run with the AMD FX 8150. I have built computers from high end to low end and know the ones in the middle last the longest and are the most stable.
[..]
At this point the fun of the build is gone, and I have too many hours dealing with problems. 
And that's not the only negative FX8?50 + 990FX review.

The worst part of it is that I've been thinking about building another, identical node (good value for money) as well as recommending my build to a student whom is about to do calcs.

Mind you, I've only ever had issues when it comes to compiling the kernel -- it's been solid when it comes to running calculations.

Original post:


NOTE: this is NOT a solution. Just observations.

My AMD FX 8150 is a great CPU -- it makes up the heart of the fastest of my computational nodes, and is eminently affordable. It does, however, cause me grief in one respect -- I can't compile the linux kernel.

The system
The box that's causing me trouble has
* AMD FX 8150 cpu
* gigabyte 990FXA-D3 motherboard
* nvidia GeForce 210 video card
* Corsair GS 800 PSU
* 4x8 Gb patriot viper PV316G186C0K RAM
While not top of the range, the components should be of reasonable quality.

In terms of software and OS, it's an up-to-date wheezy install (gcc 4.7), running kernel 3.7.2 (compiled on a different machine).

Compiling the kernel
I'm compiling the kernel as shown here: http://verahill.blogspot.com.au/2013/02/342-compiling-kernel-38-on-debian.html

The errors are shown at the end of the post

The fact that the errors keep changing might also be pointing towards there being a hardware fault with my CPU, rather than with FX 8150 in general.

3.8 built fine twice, and crashed the third time. 3.8.10 crashed twice, then built fine the third time.

It all sounds like I'm having hardware issues...but they only seem to be triggered during kernel builds. During 'normal use (i.e. using 100% cpu for weeks at a time) it is perfectly stable. Compiling e.g. nwchem (another pretty heavy compile) also goes absolutely fine.

Troubleshooting something like this also wouldn't be easy. See the end of the post for a list over various errors that I was getting during compilation of different kernel versions.

Anyway, I hit google...



BIOS
That Windows has issues with 8150 might seem unrelated, but it appeared that my errors could be solved by a bios update to my 990 fxa-d3 mobo:
 http://scalibq.wordpress.com/2011/10/19/amd-bulldozer-can-it-get-even-worse/
"The actual reported error is quite random, it just depends on where the CPU fails first. So you generally get a different error code with every BSOD."
and
"AMD’s KB article focuses solely on some boards with the 990FX chipset."
Well, I do have a 990FXA-D3 gigabyte motherboard.

My bios is shown by lshw as
*-firmware
          description: BIOS
          vendor: Award Software International, Inc.
          physical id: 0
          version: F7
          date: 05/30/2012
          size: 128KiB
          capacity: 4032KiB
So the obvious solution was to flash the bios.

Turns out, flashing the BIOS is a headache on Gigabyte motherboards (not buying anything from them again). What happened with simply burning a CD and booting with it in the drive?

Flashing the bios

I downloaded the bios (version F8): http://download.gigabyte.eu/FileList/BIOS/mb_bios_ga-990fxa-d3_f8.exe.

I unzipped it with 7z, giving me 990FXAD3.F8 -- I then put that file in the root of a USB stick..

I've tried with a number of USB sticks, including a blank stick formatted with W95 Fat32 and keeping the stick plugged in before rebooting.

In Q-flash, I always ended up with a prompt saying Floppy A <Drive>, and when I hit enter it says '..    <dir>'. 0 Files found. Yet it also said Total size 7.48G, Free Size: 7.44 G, which matched the size of the USB stick.

Finally I managed to get it to work:
*  in fdisk I only created a 1 gb partition on the USB stick, set type (t) to 6 (Fat16), made it bootable, and wrote changes to disk.
* I then ran mkdosfs -F 16 /dev/sdb1 (my usb stick was /dev/sdb).
*  I then copied the 990FXD3.F8 file to the usb stick root (after mounting it of course) and THAT worked.

Memtest86
Because RAM has traditionally been a major culprit behind hardware errors (especially the random, difficult-to-diagnose type) it's always a good idea to run a memtest. To do that, install memtest86+ (sudo apt-get install memtest86+) and reboot. There should be a new menu item (scroll down) in grub. Memtest takes quite a while, especially if you have a lot of RAM (32 Gb...).

Lo and behold, there are errors:
Tst  Pass  Failing Address              Good        Bad        Err-Bits  Count Chan
------------------------------------------------------------------------------------
6     0     0007383b4f4  -  1848.2MB   fffffbff     ffffffff   00000400    1
6     0     00039c1f294  -   924.1MB   fffffbff     ffffffff   00000004    2
6     0     00120203034  -  4610.0MB   00000004     00000000   00000004    3
6     0     001ca16c464  -  7329.4MB   00020004     00000000   00020000    4
[..]

I counted 51 errors before killing the test (time to identify the bad stick). Many of these occurred in a more limited address space than those shown above. Sigh...the RAM was the most expensive part of this build...

According to this there's a slight chance that the RAM might be ok, but it's still not a good sign.

I've tested each stick by itself -- so far 3 out of 4 sticks have yielded errors during test 6. I did seven passes on the fourth stick and no errors.

The outcome
However, even with the new bios the kernel compiles still fail -- it takes longer for it to fail, but it fails.

I do see the odd thing in dmesg though:
[ 4260.342268] as[29370]: segfault at 4541b5e ip 0000000000410306 sp 00007fff40ec4420 error 4 in as[400000+51000]

So either FX 8150 is still not properly supported by the BIOS, or I've bought a lemon.

The question remains: why do I only see failure during kernel compiles and no other conditions?

After bios flash:

Kernel 3.9-rc8
  CC      drivers/base/dd.o
In file included from /home/me/tmp/linux-3.9-rc8/arch/x86/include/asm/processor.h:23:0,
                 from /home/me/tmp/linux-3.9-rc8/arch/x86/include/asm/atomic.h:6,
                 from include/linux/atomic.h:4,
                 from include/linux/sysfs.h:20,
                 from include/linux/kobject.h:21,
                 from include/linux/device.h:17,
                 from drivers/base/dd.c:20:
/home/me/tmp/linux-3.9-rc8/arch/x86/include/asm/special_insns.h: In function 'native_read_cr0':
/home/me/tmp/linux-3.9-rc8/arch/x86/include/asm/special_insns.h:24:2: internal compiler error: in build_int_cst_wide, at tree.c:1238
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
make[3]: *** [drivers/base/dd.o] Error 1
make[2]: *** [drivers/base] Error 2
make[1]: *** [drivers] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.9-rc8'
make: *** [debian/stamp/build/kernel] Error 2
Kernel 3.8.10
  UPD     include/generated/compile.h
  CC      init/version.o
  LD      init/built-in.o
ipc/built-in.o:(.debug_info+0x1ed81): undefined reference to `.LASF108'
make[1]: *** [vmlinux] Error 1
make[1]: Leaving directory `/home/me/tmp/linux-3.8.10'
make: *** [debian/stamp/build/kernel] Error 2
Kernel 3.7.6
  CC [M]  fs/gfs2/super.o
  CC [M]  fs/gfs2/sys.o
In file included from /home/me/tmp/linux-3.7.6/arch/x86/include/asm/smp.h:13:0,
                 from include/linux/smp.h:38,
                 from include/linux/sched.h:30,
                 from fs/gfs2/sys.c:10:
/home/me/tmp/linux-3.7.6/arch/x86/include/asm/apic.h:394:1: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
make[3]: *** [fs/gfs2/sys.o] Error 1
make[2]: *** [fs/gfs2] Error 2
make[1]: *** [fs] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.7.6'
make: *** [debian/stamp/build/kernel] Error 2
Kernel 3.5
CC [M] drivers/scsi/lpfc/lpfc_els.o CC [M] drivers/scsi/lpfc/lpfc_hbadisc.o CC [M] drivers/scsi/lpfc/lpfc_init.o In file included from /home/me/tmp/linux-3.5/arch/x86/include/asm/msr.h:139:0, from /home/me/tmp/linux-3.5/arch/x86/include/asm/processor.h:20, from /home/me/tmp/linux-3.5/arch/x86/include/asm/thread_info.h:22, from include/linux/thread_info.h:54, from include/linux/preempt.h:9, from include/linux/spinlock.h:50, from include/linux/seqlock.h:29, from include/linux/time.h:8, from include/linux/timex.h:56, from include/linux/sched.h:57, from include/linux/blkdev.h:4, from drivers/scsi/lpfc/lpfc_init.c:22: /home/me/tmp/linux-3.5/arch/x86/include/asm/paravirt.h: In function 'store_gdt': /home/me/tmp/linux-3.5/arch/x86/include/asm/paravirt.h:304:2: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See for instructions. The bug is not reproducible, so it is likely a hardware or OS problem. make[4]: *** [drivers/scsi/lpfc/lpfc_init.o] Error 1 make[3]: *** [drivers/scsi/lpfc] Error 2 make[2]: *** [drivers/scsi] Error 2 make[1]: *** [drivers] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.5' make: *** [debian/stamp/build/kernel] Error 2

Kernel 3.4.42
Second crash:
  CC [M]  fs/coda/psdev.o
  CC [M]  fs/coda/cache.o
In file included from include/linux/mm.h:256:0,
                 from fs/coda/coda_linux.h:17,
                 from fs/coda/cache.c:24:
include/linux/page-flags.h:232:1: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
make[3]: *** [fs/coda/cache.o] Error 1
make[2]: *** [fs/coda] Error 2
make[1]: *** [fs] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.4.42'
make: *** [debian/stamp/build/kernel] Error 2
First crash:
  CC      kernel/signal.o
gcc: internal compiler error: Segmentation fault (program as)
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
make[2]: *** [kernel/signal.o] Error 4
make[1]: *** [kernel] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.4.42'
make: *** [debian/stamp/build/kernel] Error 2


Kernels that won't build and the errors -- before bios flash:
3.9-rc8
CC [M] fs/nfs/nfs4client.o CC [M] fs/nfs/nfs4sysctl.o CC [M] fs/nfs/nfs4session.o CC [M] fs/nfs/pnfs.o fs/nfs/pnfs.c: In function 'read_seqcount_retry': fs/nfs/pnfs.c:1951:1: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See for instructions. The bug is not reproducible, so it is likely a hardware or OS problem. make[3]: *** [fs/nfs/pnfs.o] Error 1 make[2]: *** [fs/nfs] Error 2 make[1]: *** [fs] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.9-rc8' make: *** [debian/stamp/build/kernel] Error 2
3.8.10
  CC [M]  fs/nfs/inode.o
In file included from include/net/scm.h:6:0,
                 from include/linux/netlink.h:8,
                 from /home/me/tmp/linux-3.8.10/include/uapi/linux/neighbour.h:5,
                 from include/linux/netdevice.h:51,
                 from include/linux/icmpv6.h:12,
                 from include/linux/ipv6.h:59,
                 from include/net/ipv6.h:16,
                 from include/linux/sunrpc/clnt.h:26,
                 from fs/nfs/inode.c:26:
include/linux/security.h:2581:1: internal compiler error: Segmentation fault
Please submit a full bug report,
3.8.6
CC [M] drivers/hid/hid-lg.o CC [M] drivers/hid/hid-lgff.o CC [M] drivers/hid/hid-lg2ff.o CC [M] drivers/hid/hid-lg3ff.o CC [M] drivers/hid/hid-lg4ff.o CC [M] drivers/hid/hid-picolcd_core.o CC [M] drivers/hid/hid-picolcd_fb.o CC [M] drivers/hid/hid-picolcd_backlight.o drivers/hid/hid-picolcd_backlight.c:120:1: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See for instructions. The bug is not reproducible, so it is likely a hardware or OS problem. make[3]: *** [drivers/hid/hid-picolcd_backlight.o] Error 1 make[2]: *** [drivers/hid] Error 2 make[1]: *** [drivers] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.8.6' make: *** [debian/stamp/build/kernel] Error 2
3.8
CC mm/dmapool.o CC mm/hugetlb.o /bin/sh: line 1: 25153 Done(2) gcc -E -D__GENKSYMS__ -Wp,-MD,mm/.hugetlb.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-linux-gnu/4.7/include -I/home/me/tmp/linux-3.8/arch/x86/include -Iarch/x86/include/generated -Iinclude -I/home/me/tmp/linux-3.8/arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I/home/me/tmp/linux-3.8/include/uapi -Iinclude/generated/uapi -include /home/me/tmp/linux-3.8/include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -Os -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -Wframe-larger-than=2048 -Wno-unused-but-set-variable -fomit-frame-pointer -g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(hugetlb)" -D"KBUILD_MODNAME=KBUILD_STR(hugetlb)" mm/hugetlb.c 25154 Segmentation fault | scripts/genksyms/genksyms -a x86_64 -r /dev/null > mm/.tmp_hugetlb.ver make[2]: *** [mm/hugetlb.o] Error 139 make[1]: *** [mm] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.8' make: *** [debian/stamp/build/kernel] Error 2
3.7.6

The errors differ each time:

Second run:
  CC [M]  fs/ext2/namei.o
  CC [M]  fs/ext2/super.o
fs/ext2/super.c: In function 'ext2_fill_super':
fs/ext2/super.c:762:12: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
make[3]: *** [fs/ext2/super.o] Error 1
make[2]: *** [fs/ext2] Error 2
make[1]: *** [fs] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.7.6'
make: *** [debian/stamp/build/kernel] Error 2
First run:
  CC      drivers/base/power/main.o
/bin/sh: line 1: 12317 Done                    gcc -E -D__GENKSYMS__ -Wp,-MD,drivers/base/power/.main.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-linux-gnu/4.7/include -I/home/me/tmp/linux-3.7.6/arch/x86/include -Iarch/x86/include/generated -Iinclude -I/home/me/tmp/linux-3.7.6/arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I/home/me/tmp/linux-3.7.6/include/uapi -Iinclude/generated/uapi -include /home/me/tmp/linux-3.7.6/include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -Os -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_AVX=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -Wframe-larger-than=2048 -Wno-unused-but-set-variable -fomit-frame-pointer -g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(main)" -D"KBUILD_MODNAME=KBUILD_STR(main)" drivers/base/power/main.c
     12318 Segmentation fault      | scripts/genksyms/genksyms -a x86_64 -r /dev/null > drivers/base/power/.tmp_main.ver
make[4]: *** [drivers/base/power/main.o] Error 139
make[3]: *** [drivers/base/power] Error 2
make[2]: *** [drivers/base] Error 2
make[1]: *** [drivers] Error 2
3.7.2
The errors differ every time:

Second run:
CC [M] drivers/hwmon/tmp102.o ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x76d76)[0x2b2bb07f6d76] /lib/x86_64-linux-gnu/libc.so.6(+0x7a658)[0x2b2bb07fa658] /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x70)[0x2b2bb07fbb90] scripts/genksyms/genksyms[0x4075fa] scripts/genksyms/genksyms[0x4037c0] scripts/genksyms/genksyms[0x402de6] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x2b2bb079eead] scripts/genksyms/genksyms[0x400f59] ======= Memory map: ======== 00400000-0040e000 r-xp 00000000 08:05 36440720 /home/me/tmp/linux-3.7.2/scripts/genksyms/genksyms 0060d000-0060e000 rw-p 0000d000 08:05 36440720 /home/me/tmp/linux-3.7.2/scripts/genksyms/genksyms 0060e000-00616000 rw-p 00000000 00:00 0 0089b000-00c58000 rw-p 00000000 00:00 0 [heap] 2b2bb0330000-2b2bb0350000 r-xp 00000000 08:01 11802457 /lib/x86_64-linux-gnu/ld-2.13.so 2b2bb0350000-2b2bb0352000 rw-p 00000000 00:00 0 2b2bb054f000-2b2bb0550000 r--p 0001f000 08:01 11802457 /lib/x86_64-linux-gnu/ld-2.13.so 2b2bb0550000-2b2bb0551000 rw-p 00020000 08:01 11802457 /lib/x86_64-linux-gnu/ld-2.13.so 2b2bb0551000-2b2bb0552000 rw-p 00000000 00:00 0 2b2bb0558000-2b2bb0561000 r-xp 00000000 08:01 2233201 /usr/lib/x86_64-linux-gnu/libfakeroot/libfakeroot-sysv.so 2b2bb0561000-2b2bb0761000 ---p 00009000 08:01 2233201 /usr/lib/x86_64-linux-gnu/libfakeroot/libfakeroot-sysv.so 2b2bb0761000-2b2bb0762000 rw-p 00009000 08:01 2233201 /usr/lib/x86_64-linux-gnu/libfakeroot/libfakeroot-sysv.so 2b2bb0762000-2b2bb0763000 rw-p 00000000 00:00 0 2b2bb077c000-2b2bb077d000 rw-p 00000000 00:00 0 2b2bb0780000-2b2bb0900000 r-xp 00000000 08:01 11802454 /lib/x86_64-linux-gnu/libc-2.13.so 2b2bb0900000-2b2bb0b00000 ---p 00180000 08:01 11802454 /lib/x86_64-linux-gnu/libc-2.13.so 2b2bb0b00000-2b2bb0b04000 r--p 00180000 08:01 11802454 /lib/x86_64-linux-gnu/libc-2.13.so 2b2bb0b04000-2b2bb0b05000 rw-p 00184000 08:01 11802454 /lib/x86_64-linux-gnu/libc-2.13.so 2b2bb0b05000-2b2bb0b0a000 rw-p 00000000 00:00 0 2b2bb0b10000-2b2bb0b12000 r-xp 00000000 08:01 11802447 /lib/x86_64-linux-gnu/libdl-2.13.so 2b2bb0b12000-2b2bb0d12000 ---p 00002000 08:01 11802447 /lib/x86_64-linux-gnu/libdl-2.13.so 2b2bb0d12000-2b2bb0d13000 r--p 00002000 08:01 11802447 /lib/x86_64-linux-gnu/libdl-2.13.so 2b2bb0d13000-2b2bb0d14000 rw-p 00003000 08:01 11802447 /lib/x86_64-linux-gnu/libdl-2.13.so 2b2bb0d14000-2b2bb0d16000 rw-p 00000000 00:00 0 2b2bb0d30000-2b2bb0d45000 r-xp 00000000 08:01 11796731 /lib/x86_64-linux-gnu/libgcc_s.so.1 2b2bb0d45000-2b2bb0f45000 ---p 00015000 08:01 11796731 /lib/x86_64-linux-gnu/libgcc_s.so.1 2b2bb0f45000-2b2bb0f46000 rw-p 00015000 08:01 11796731 /lib/x86_64-linux-gnu/libgcc_s.so.1 2b2bb4000000-2b2bb4021000 rw-p 00000000 00:00 0 2b2bb4021000-2b2bb8000000 ---p 00000000 00:00 0 7fff5cd00000-7fff5cd23000 rw-p 00000000 00:00 0 [stack] 7fff5cdd8000-7fff5cdd9000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] /bin/sh: line 1: 18863 Done(2) gcc -E -D__GENKSYMS__ -Wp,-MD,drivers/acpi/.video.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-linux-gnu/4.7/include -I/home/me/tmp/linux-3.7.2/arch/x86/include -Iarch/x86/include/generated -Iinclude -I/home/me/tmp/linux-3.7.2/arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I/home/me/tmp/linux-3.7.2/include/uapi -Iinclude/generated/uapi -include /home/me/tmp/linux-3.7.2/include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -Os -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_AVX=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -Wframe-larger-than=2048 -Wno-unused-but-set-variable -fomit-frame-pointer -g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO -Os -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(video)" -D"KBUILD_MODNAME=KBUILD_STR(video)" drivers/acpi/video.c 18864 Aborted | scripts/genksyms/genksyms -a x86_64 -r /dev/null > drivers/acpi/.tmp_video.ver make[3]: *** [drivers/acpi/video.o] Error 134 make[2]: *** [drivers/acpi] Error 2 make[1]: *** [drivers] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.7.2' make: *** [debian/stamp/build/kernel] Error 2

First run:
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  CALL    scripts/checksyscalls.sh
  Building modules, stage 2.
  MODPOST 2369 modules
ERROR: "ieee80211_get_hdrlen" [drivers/staging/rtl8192u/r8192u_usb.ko] undefined!
ERROR: "ieee80211_is_empty_essid" [drivers/staging/rtl8192u/r8192u_usb.ko] undefined!
make[2]: *** [__modpost] Error 1
make[1]: *** [modules] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.7.2'
make: *** [debian/stamp/build/kernel] Error 2
3.6.3
LD [M] drivers/input/misc/pcf50633-input.ko CC drivers/input/misc/pcspkr.mod.o In file included from drivers/input/misc/pcspkr.mod.c:1:0: include/linux/module.h:299:9: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See for instructions. The bug is not reproducible, so it is likely a hardware or OS problem. make[2]: *** [drivers/input/misc/pcspkr.mod.o] Error 1 make[1]: *** [modules] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.6.3' make: *** [debian/stamp/build/kernel] Error 2
3.5.0
The errors keep changing.

Second run:
  CC [M]  drivers/gpu/drm/via/via_map.o
  CC [M]  drivers/gpu/drm/via/via_mm.o
  CC [M]  drivers/gpu/drm/via/via_dma.o
drivers/gpu/drm/via/via_dma.c:741:21: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
make[5]: *** [drivers/gpu/drm/via/via_dma.o] Error 1
make[4]: *** [drivers/gpu/drm/via] Error 2
make[3]: *** [drivers/gpu/drm] Error 2
make[2]: *** [drivers/gpu] Error 2
make[1]: *** [drivers] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.5'
make: *** [debian/stamp/build/kernel] Error 2
First run:
  CC      drivers/hid/hid-sony.mod.o
drivers/hid/hid-sony.mod.c:46:1: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
make[2]: *** [drivers/hid/hid-sony.mod.o] Error 1
make[1]: *** [modules] Error 2
make[1]: Leaving directory `/home/me/tmp/linux-3.5'
make: *** [debian/stamp/build/kernel] Error 2
3.4.42
CC [M] fs/quota/quota_tree.o CC [M] fs/reiserfs/bitmap.o fs/reiserfs/bitmap.c: In function 'scan_bitmap_block.constprop.9': fs/reiserfs/bitmap.c:236:9: warning: 'next' may be used uninitialized in this function [-Wmaybe-uninitialized] CC [M] fs/reiserfs/do_balan.o CC [M] fs/reiserfs/namei.o gcc: internal compiler error: Segmentation fault (program as) Please submit a full bug report, with preprocessed source if appropriate. See for instructions. make[3]: *** [fs/reiserfs/namei.o] Error 4 make[2]: *** [fs/reiserfs] Error 2 make[1]: *** [fs] Error 2 make[1]: Leaving directory `/home/me/tmp/linux-3.4.42' make: *** [debian/stamp/build/kernel] Error 2


Another dmesg error:

400. XpressConnect on Debian, Arch: step by step

Here's a step-by-step write-up of this post: http://verahill.blogspot.com.au/2013/04/393-not-fix-xpressconnect-on-ubuntu-vs.html

The 'problem' with running Xpress Connect on non-Ubuntu linux distributions is entirely artificial -- XpressConnect checks whether you are using Ubuntu, and if you're not, it refuses to run.

So the solution is simply to pretend that you are using ubuntu, however annoying that is. I wish universities would take this into account and end their association with Cloudpath, or to force them to support other distributions.

Note: XpressConnect is completely superfluous -- it doesn't do anything other than set up your wireless connection, which is something you could easily do by hand. See e.g. here for eduroam: http://verahill.blogspot.com.au/2013/04/394-eduroam-using-wicd-and-network.html


How-to get XpressConnect running
1. Create the file /etc/lsb-release and put the following in it
DISTRIB_ID=Ubuntu DISTRIB_RELEASE=10.04 DISTRIB_CODENAME=lucid DISTRIB_DESCRIPTION="Ubuntu 10.04.4 LTS"
If you are completely new to linux, one way of creating the file is to run
gksu gedit /etc/lsb-release

Alternatively, if you're not using gnome, try
sudo nano /etc/lsb-release 

2. Install lshw and iwlist

On debian (and clones like mint, ubuntu etc.):
sudo apt-get install lshw wireless-tools

On arch linux
sudo pacman -S lshw wireless_tools

3. Run XpressConnect
This is the vanilla version -- replace http://hosted.cloudpath.net/Xavier/Production/tools/XpressConnect-Linux.tar with the link to your universities version.

cd ~/Downloads
wget http://hosted.cloudpath.net/Xavier/Production/tools/XpressConnect-Linux.tar
tar xvf XpressConnect-Linux.tar
./XpressConnect-DoubleClickToRun

That's it. Simple as that.

27 April 2013

399. Looking at speeding up (re)boot on debian wheezy.

I'd be interested in getting my beowulf cluster nodes to boot a little bit faster -- (re)boots of the nodes very are infrequent, but the front node doubles as my work desktop and is normally rebooted at least once per month (kernel upgrades etc.) -- rebooting the front node makes me nervous, however, and the faster it boots, the better it is.

I should probably build a low-powered front node specifically for my cluster though...but that takes money, and money takes time.

Anyway, boot. In spite of the impetus for this post I'm testing this on my laptop which has wheezy, gnome 3.4 and an SSD -- it's not that representative of the target system and I'll have to repeat this on a normal desktop with a spinning hdd at a later stage.

I'm more or less following http://wiki.debian.org/BootProcessSpeedup. Note that insserv seems to be set up and enabled by default in Wheezy.


Timing it -- Setting up bootchart2
I first tried to define boot times arbitrarily as the time from me hitting enter in GRUB, to the visual appearance of the log-in prompt in GDM3, but it was too imprecise (up to +- 2) relative to the time a boot took (ca 9-10s).

I ended up installing bootchart and bootchart-view instead.
sudo apt-get install bootchart2

Then edit /etc/default/grub as shown here:
GRUB_CMDLINE_LINUX_DEFAULT="quiet initcall_debug printk.time=y init=/sbin/bootchartd"
and run
sudo update-grub

After a boot, run
pybootchartgui
eog bootchart.png

You'll get something like this:
Look at the top, right above the first chart -- it says 'time: 6.61s'. I'll use that as the metric.

Most of the time bootchart2 worked fine, but for the odd boot the /var/log/bootchart.tgz wasn't accepted by pybootchartgui.

Normal boot, pre-optimisation: 
'Cold' reboots: 6.61, 5.77 seconds
Warm* reboots: 6.46, 5.79, 5.97 seconds

*using shutdown -r now

The variability is very high -- there's almost a second between the fastest and slowest boots. Keep that in mind when looking at the numbers later on.


Using readahead-fedora to pre-load files
sudo apt-get install readahead-fedora

After install, readhead-early, -late and stop were enabled in rcconf.

The first boot took over 7 seconds, but later boots were typically around 6 seconds or faster. Note that readahead is solving an issue which isn't really present when using high bandwidth SSDs, and may even slow things down under conditions where you use an SSD or a spinning disk with a high rpm (e.g. >7200 rpm)

First run

'normal' run

Not exactly an improvement. Looking at /etc/readahead.d/custom.early shows that the wrong kernel files are loaded -- I'm using a custom kernel (3.8.5-ck1) but the stock kernel files are loaded (3.2.0-4). I edited custom.early to point towards my current kernel, and then did a warm reboot.


Speeding up reboots -Kexec
sudo apt-get install kexec-tools

Shutdown your computer once, then boot up. After that first time you can do warm reboots (sudo shutdown -r now) without going through the BIOS and grub stages. The only -- visible -- downside is that your screen will go crazy for a few seconds as the running kernel is being overwritten by the new kernel (I presume). Doesn't look pretty, but reboot is fast.

I couldn't get bootchart to time the hot reboots, but they look 'fast'.


I'll be repeating this on a system with a spinning disk at a later stage.