Nvidia NVML Driver/library version mismatch

CudaDriverGpuNvidia

Cuda Problem Overview


When I run nvidia-smi, I get the following message:

> Failed to initialize NVML: Driver/library version mismatch

An hour ago I received the same message and uninstalled my CUDA library and I was able to run nvidia-smi, getting the following result:

nvidia-smi-result

After this I downloaded cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb from the official NVIDIA page and then simply:

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}

Now I have CUDA installed, but I get the mentioned mismatch error.


Some potentially useful information:

Running cat /proc/driver/nvidia/version I get:

NVRM version: NVIDIA UNIX x86_64 Kernel Module  378.13  Tue Feb  7 20:10:06 PST 2017
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)

I'm running Ubuntu 16.04.2 LTS (Xenial Xerus).

The kernel release is 4.4.0-66-generic.

Cuda Solutions


Solution 1 - Cuda

Surprise surprise, rebooting solved the issue (I thought I had already tried that).

The solution Robert Crovella mentioned in the comments may also be useful to someone else, since it's pretty similar to what I did to solve the issue the first time I had it.

Solution 2 - Cuda

As etal said, rebooting can solve this problem, but I think a procedure without rebooting will help.

For Chinese, check my blog -> 中文版

The error message

> NVML: Driver/library version mismatch

tell us the Nvidia driver kernel module (kmod) have a wrong version, so we should unload this driver, and then load the correct version of kmod

How can we do that?

First, we should know which drivers are loaded.

lsmod | grep nvidia

You may get

nvidia_uvm            634880  8
nvidia_drm             53248  0
nvidia_modeset        790528  1 nvidia_drm
nvidia              12312576  86 nvidia_modeset,nvidia_uvm

Our final goal is to unload nvidia mod, so we should unload the module depend on nvidia:

sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia_uvm

Then, unload nvidia

sudo rmmod nvidia

Troubleshooting

If you get an error like rmmod: ERROR: Module nvidia is in use, which indicates that the kernel module is in use, you should kill the process that using the kmod:

sudo lsof /dev/nvidia*

and then kill those process, then continue to unload the kmods.

Test

Confirm you successfully unload those kmods

lsmod | grep nvidia

You should get nothing. Then confirm you can load the correct driver:

nvidia-smi

You should get the correct output.

Solution 3 - Cuda

I was having this problem, and none of the other remedies worked. The error message was opaque, but checking the output of dmesg was the key:

[   10.118255] NVRM: API mismatch: the client has the version 410.79, but
           NVRM: this kernel module has the version 384.130.  Please
           NVRM: make sure that this kernel module and all NVIDIA driver
           NVRM: components have the same version.

However, I had completely removed the 384 version, and removed any remaining kernel drivers nvidia-384*. But even after reboot, I was still getting this. Seeing this meant that the kernel was still compiled to reference 384, but it was only finding 410. So I recompiled my kernel:

uname -a # Find the kernel it's using

Linux blah 4.13.0-43-generic #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux


update-initramfs -c -k 4.13.0-43-generic # Recompile it
reboot

And then it worked.

After removing 384, I still had 384 files in: /var/lib/dkms/nvidia-XXX/XXX.YY/4.13.0-43-generic/x86_64/module /lib/modules/4.13.0-43-generic/kernel/drivers

I recommend using the locate command (not installed by default) rather than searching the filesystem every time.

Solution 4 - Cuda

Why does the version mismatch happen and how can we prevent it from happening again?

You may find that the versions of nvidia-* are different in these locations:

  1. dpkg -l | grep nvidia (look at nvidia-utils-xxx package version), and
  2. cat /proc/driver/nvidia/version (look at the version of Kernel Module, 460.56 - for example)

The restart should work, but you may want to forbid the automatic update of this package by modifying /etc/apt/sources.list.d/ files or simply hold the package by executing the command apt-mark hold nvidia-utils-version_number.

P.S.: Some content was inspired by this (the original instruction was in Chinese, so I referenced the translated version instead)

Solution 5 - Cuda

The top-2 answers can't solve my problem. I found a solution at the Nvidia official forum solved my problem.

The below error information may be caused by installing two different versions of the driver by different approaches. For example, install Nvidia driver by APT and the official installer.

> Failed to initialize NVML: Driver/library version mismatch

To solve this problem, there is only a need to execute one of the following two commands.

sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall

Solution 6 - Cuda

I had the issue too (I'm running Ubuntu 18.04 (Bionic Beaver)).

What I did:

dpkg -l | grep -i nvidia

Then sudo apt-get remove --purge nvidia-381 (and every duplicate version, in my case I had 381, 384 and 387)

Then sudo ubuntu-drivers devices to list what's available.

And I choose sudo apt install nvidia-driver-430.

After that, nvidia-smi gave the correct output (no need to reboot). But I suppose you can reboot when in doubt.

I also followed this installation to reinstall cuda+cudnn.

Solution 7 - Cuda

Reboot.

If the problem still exist:

sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia
nvidia-smi

For CentOS and Red Hat Enterprise Linux (RHEL):

cd /boot
mv initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut -vf initramfs-$(uname -r).img $(uname -r)

Then

reboot

For Debian/Ubuntu:

update-initramfs -u

If the problem persists:

apt install -y dkms && dkms install -m nvidia -v 440.82

Change 440.82 to your actual version.

Tip: get the Nvidia driver version:

ls /usr/src

You will find the Nvidia driver directory, such as nvidia-440.82.


Also, you can remove all Nvidia packages and reinstall the driver again:

apt purge nvidia*
apt purge *cuda*

# Check
apt list -i |grep nvidia
apt list -i |grep cuda

Solution 8 - Cuda

This also happened to me on Ubuntu 16.04 using the nvidia-348 package (latest Nvidia version on Ubuntu 16.04).

However I could resolve the problem by installing nvidia-390 through the Proprietary GPU Drivers PPA.

So a solution to the described problem on Ubuntu 16.04 is doing this:

  • sudo add-apt-repository ppa:graphics-drivers/ppa
  • sudo apt-get update
  • sudo apt-get install nvidia-390

Note: This guide assumes a clean Ubuntu install. If you have previous drivers installed a reboot might be needed to reload all the kernel modules.

Solution 9 - Cuda

These answers did not work for me:

dmesg

NVRM: API mismatch: the client has the version 418.67, but
NVRM: this kernel module has the version 430.26.  Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.

Uninstall old driver 418.67 and install new driver 430.26 (download NVIDIA-Linux-x86_64-430.26.run):

sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall
chmod +x NVIDIA-Linux-x86_64-430.26.run
sudo ./NVIDIA-Linux-x86_64-430.26.run
[ignore abort]

cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module  430.26  Tue Jun  4 17:40:52 CDT 2019
GCC version:  gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)

Solution 10 - Cuda

Mostly reboot would fix the issue on Ubuntu 18.04 (Bionic Beaver).

The “Failed to initialize NVML: Driver/library version mismatch?” error generally means the CUDA Driver is still running an older release that is incompatible with the CUDA toolkit version currently in use. Rebooting the compute nodes will generally resolve this issue.

Solution 11 - Cuda

I experienced this problem after a normal kernel update on a CentOS machine. Since all CUDA and Nvidia drivers and libraries have been installed via YUM repositories, I managed to solve the issues using the following steps:

sudo yum remove nvidia-driver-*
sudo reboot
sudo yum install nvidia-driver-cuda nvidia-modprobe
sudo modprobe nvidia # Or just reboot

It made sure my kernel and my Nvidia driver were consistent. I reckon that just rebooting may result in the wrong version of the kernel module being loaded.

Solution 12 - Cuda

It doesn't work for me by rebooting or unloading the driver. I solved the problem by updating my Nvidia driver 440.33.01 to 450.80.2.

sudo apt-get install nvidia-driver-450

sudo reboot

I'm running Ubuntu 20.04 LTS (Focal Fossa), which is a remote server.

Solution 13 - Cuda

I had reinstalled the Nvidia driver: run these commands in root mode:

  1. systemctl isolate multi-user.target

  2. modprobe -r nvidia-drm

  3. Reinstall the Nvidia driver: chmod +x NVIDIA-Linux-x86_64–410.57.run

  4. systemctl start graphical.target

And finally check nvidia-smi

Thanks to:

Solution 14 - Cuda

I committed the container into a Docker image. Then I recreated another container using this Docker image and the problem was gone.

Solution 15 - Cuda

I have to restart my kernels and remove all the packages that I have installed previously (during the first installation). Please make sure to delete all the packages, even after removing packages by the command below:

sudo apt-get --purge remove "*nvidia*"

The packages, like "libtinfo6:i386", don't get removed.

I'm using Ubuntu 20.04 (Focal Fossa) and Nvidia-driver-440. For that, you have to remove all the packages shown in the below image.

List of all the packages that need to be remove:

img

As shown in the image, make sure that the package you're installing is of the correct size. That is 207  MB for Nvidia-driver-440. If it's less, it means you haven't removed all the packages.

Solution 16 - Cuda

For completeness, I ran into this issue as well. In my case it turned out that because I had set Clang as my default compiler (using update-alternatives), nvidia-driver-440 failed to compile (check /var/crash/) even though apt didn't post any warnings. For me, the solution was to apt purge nvidia-*, set cc back to use gcc, reboot, and reinstall nvidia-driver-440.

Solution 17 - Cuda

First I installed the Nvidia driver.

Next I installed CUDA.

After that, I got the "Driver/library version mismatch" error, but I could see the CUDA version, so I purged the Nvidia driver and reinstalled it.

Then it worked correctly.

Solution 18 - Cuda

There is an easier solution that worked for me. On Fedora 33, try the following:

rpm -qa | grep -i nvidia | grep f32

You should have two packages listed from the previous version of Fedora for OpenGL. Remove those and reboot.

Deleting and reinstalling the entire Nvidia package set is overkill.

Solution 19 - Cuda

I was facing the same problem and I'm posting my solution here.

In my case, the NVRM version was 440.100 and the driver version was 460.32.03. My driver was updated by sudo apt install caffe-cuda and I didn't notice at that time, but I checked it from /var/log/apt/history.log.

By following my NVRM version, I just used sudo apt install nvidia-driver-440, but it installed 450.102. I don't know why it installed another version and nvidia-smi is showing 450.102.04.

Anyhow, after rebooting my PC, everything is working fine now. After reinstalling the driver, still my CUDA is working fine.

I didn't remove/purge anything related to the Nvidia driver. Version 460.32.03 was uninstalled automatically by running sudo apt install nvidia-driver-440.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionetalView Question on Stackoverflow
Solution 1 - CudaetalView Answer on Stackoverflow
Solution 2 - CudaComzyhView Answer on Stackoverflow
Solution 3 - Cudauser9170View Answer on Stackoverflow
Solution 4 - CudaLongView Answer on Stackoverflow
Solution 5 - CudazhenlingcnView Answer on Stackoverflow
Solution 6 - CudaBenjamin CrouzierView Answer on Stackoverflow
Solution 7 - CudalevinitView Answer on Stackoverflow
Solution 8 - CudaStefan HorningView Answer on Stackoverflow
Solution 9 - CudamrgloomView Answer on Stackoverflow
Solution 10 - CudaSushenaView Answer on Stackoverflow
Solution 11 - CudascrutariView Answer on Stackoverflow
Solution 12 - CudaququuyView Answer on Stackoverflow
Solution 13 - CudaBarzanHayatiView Answer on Stackoverflow
Solution 14 - CudaBeratView Answer on Stackoverflow
Solution 15 - CudaGaurav GhatiView Answer on Stackoverflow
Solution 16 - CudaTomView Answer on Stackoverflow
Solution 17 - CudaAli EbrahimiView Answer on Stackoverflow
Solution 18 - CudaAlan OlsenView Answer on Stackoverflow
Solution 19 - CudaErricView Answer on Stackoverflow