Pages

27 February 2012

81. nvidia 295.20 bug causing gnome-shell to crash on Debian Testing


UPDATE: Here's how to downgrade your drivers:
http://verahill.blogspot.com.au/2012/03/debian-testing-downgrading-nvidia.html

Update: 
A bigger issue is what this bug does to evolution:
http://verahill.blogspot.com.au/2012/02/debian-testing-wheezy-64-no-real.html


The symptoms:
I've tried to use the correct gnome-shell terminology.

1. Go to the top left corner (Hot Corner) of the desktop to get the Overview and Search Entry field
2. Start typing in the name of an application
3. The window will flicker as if gnome-shell is being restarted (similar to alt+f2++r)
4. Do it again and you get a full-on crash with an unhappy looking computer

dmesg based on repeatedly crashing gnome-shell says:

[ 7011.967820] gnome-shell[32742]: segfault at 10 ip 00007fa1b6d98c0f sp 00007fa1914a1638 error 6 in libnvidia-tls.so.295.20[7fa1b6d98000+3000]
[ 7111.276979] gnome-shell[748]: segfault at 10 ip 00007ff7eb598c0f sp 00007ff7beffc638 error 6 in libnvidia-tls.so.295.20[7ff7eb598000+3000]
[ 7620.952276] gnome-shell[2933]: segfault at 10 ip 00007f0a9fdd9c0f sp 00007f0a710fe638 error 6 in libnvidia-tls.so.295.20[7f0a9fdd9000+3000]
[ 7628.106656] gnome-shell[2986]: segfault at 10 ip 00007f26423f3c0f sp 00007f2612ffd638 error 6 in libnvidia-tls.so.295.20[7f26423f3000+3000]
[ 7658.755466] gnome-shell[3818]: segfault at 10 ip 00007f76bbf2cc0f sp 00007f7691a77638 error 6 in libnvidia-tls.so.295.20[7f76bbf2c000+3000]
[ 7666.310714] gnome-shell[3905]: segfault at 10 ip 00007f3279e64c0f sp 00007f325469d638 error 6 in libnvidia-tls.so.295.20[7f3279e64000+3000]
[ 7717.061483] gnome-shell[4829]: segfault at 10 ip 00007f245ad26c0f sp 00007f243469c638 error 6 in libnvidia-tls.so.295.20[7f245ad26000+3000]


The libnvidia-tls files are found here:
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.295.20
/usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.295.20

and
 dpkg --search libnvidia-tls.so.295.20 
gives
libgl1-nvidia-glx: /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.295.20
libgl1-nvidia-glx: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.295.20
Ergo, that's where the bug is.


Cause:
Bad nvidia drivers -- in package libgl1-nvidia-glx

This is not unique to debian.
"Confirmed, I'm seeing the same on Gentoo ~amd64. gnome-shell 3.2.2.1 crashes while doing a search with nvidia-drivers 295.20 installed (backtrace is in libnvidia-tls.so). Downgrading to nvidia-drivers 290.10 resolves the issue, so it is a problem with the drivers."

http://www.nvnews.net/vbulletin/showthread.php?t=174049 (14 Feb 2012)

UPDATE: Here's how to downgrade your drivers:
http://verahill.blogspot.com.au/2012/03/debian-testing-downgrading-nvidia.html


There are no bugs listed for libgl1-nvidia-glx
http://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=libgl1-nvidia-glx

But nvidia-glx has it's fair share of bugs filed against it:
http://bugs.debian.org/cgi-bin/pkgreport.cgi?package=nvidia-glx

From what I can tell this is the relevant bug report (17 February 2012):
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=660189
which points to
http://www.nvnews.net/vbulletin/showthread.php?t=174049&page=3

Solution:

1. The 'proper way':
Downgrade your drivers.

UPDATE: Here's how to downgrade your drivers:
http://verahill.blogspot.com.au/2012/03/debian-testing-downgrading-nvidia.html


2. The easy, interesting way:
"For me deleting recently-used.xbel and recreating it with no content solved the problem just temporary. But instead of creating a equally named directory one could also sudo chattr +i recently-used.xbel to keep the file empty.  Keeping the file empty also significantly speeds up the application launcher for me. so it would be nice to have a way to configure this instead of fixing it that rude way, for folks that dont want or need recently used files."


In practical terms, this means:

echo ""> ~/.local/share/recently-used.xbel
sudo chattr +i ~/.local/share/recently-used.xbel

And you are done!

Once the bug has been fixed, you can do
 chattr -i ~/.local/share/recently-used.xbel 


to restore normal functionality

This solution worked for me on an up-to-date debian testing.

Oh well. At least the folks at nvidia are aware of the bug:


Thoughts:
The nvidia binaries only entered the debian testing repos around the 25-26 of February from what I can tell. The bug was known for ten days by then, so why did the binaries get promoted to testing?

Here's what I've got installed:

i A glx-alternative-nvidia          - allows the selection of NVIDIA as GLX prov
i A libgl1-nvidia-alternatives      - transition libGL.so* diversions to glx-alt
i A libgl1-nvidia-glx               - NVIDIA binary OpenGL libraries          
i A libglx-nvidia-alternatives      - transition libgl.so diversions to glx-alte
pi  libnvidia-compiler-ia32         - NVIDIA runtime compiler library (32-bit)
i A libnvidia-ml1                   - NVIDIA management library (NVML) runtime l
i A nvidia-alternative              - allows the selection of NVIDIA as GLX prov
i A nvidia-compute-profiler         - NVIDIA Compute Visual Profiler          
i   nvidia-cuda-dev                 - NVIDIA CUDA development files          
i A nvidia-cuda-doc                 - NVIDIA CUDA and OpenCL documentation    
i A nvidia-cuda-gdb                 - NVIDIA CUDA GDB                        
i A nvidia-cuda-toolkit             - NVIDIA CUDA toolkit                    
i   nvidia-glx                      - NVIDIA metapackage                      
i A nvidia-installer-cleanup        - Cleanup after driver installation with the
i   nvidia-kernel-3.1.0-1-amd64     - NVIDIA binary kernel module for Linux 3.1.
i   nvidia-kernel-common            - NVIDIA binary kernel module support files
i A nvidia-kernel-dkms              - NVIDIA binary kernel module DKMS source
i   nvidia-kernel-source            - NVIDIA binary kernel module source      
i A nvidia-libopencl1               - NVIDIA OpenCL library                  
i   nvidia-libopencl1-ia32          - NVIDIA OpenCL 32-bit library            
pi  nvidia-opencl-common            - NVIDIA OpenCL driver                    
i   nvidia-opencl-dev               - NVIDIA OpenCL development files        
pi  nvidia-opencl-icd-ia32          - NVIDIA OpenCL ICD (32-bit)              
i   nvidia-settings                 - Tool for configuring the NVIDIA graphics d
i A nvidia-smi                      - NVIDIA System Management Interface      
i A nvidia-support                  - NVIDIA binary graphics driver support file
i A nvidia-vdpau-driver             - NVIDIA vdpau driver                    
pi  nvidia-xconfig                  - X configuration tool for non-free NVIDIA d
i A xserver-xorg-video-nvidia       - NVIDIA binary Xorg driver






3 comments:

  1. Thanks a lot, it worked :D

    ReplyDelete
  2. Thanks for the information. Though, I did not downgrade as it looks like a lot of work plus I will probably forget to install the correct versions in the future. My solution is an upgrade from the unstable repository.

    1) create file /etc/apt/preferences.d/unstable with:
    Package: *
    Pin: release a=unstable
    Pin-Priority: 50

    2) add unstable repository in /etc/apt/sources.list

    3) upgrade nvidia-glx (and its dependencies)

    After the packages are accepted to Testing this exception will go away automatically. Inspiration: http://www.imped.net/2007/07/20/apt-pinning-installing-unstable-packages-on-stable-debian/

    ReplyDelete
  3. I had the same problem.
    I solved with hardware-driver-proprietary and changing Nvidia driver from
    current-updates to current version

    ReplyDelete