tensorflow on GPU: no known devices, despite cuda's deviceQuery returning a "PASS" result

Tensorflow

Tensorflow Problem Overview


> Note : this question was initially asked on github, but it was asked to be here instead

I'm having trouble running tensorflow on gpu, and it does not seems to be the usual cuda's configuration problem, because everything seems to indicate cuda is properly setup.

The main symptom: when running tensorflow, my gpu is not detected (the code being run, and its output).

What differs from usual issues is that cuda seems properly installed and running ./deviceQuery from cuda samples is successful (output).

I have two graphical cards:

  • an old GTX 650 used for my monitors (I don't want to use that one with tensorflow)
  • a GTX 1060 that I want to dedicate to tensorflow

I use:

I've tried:

  • adding /usr/local/cuda/bin/ to $PATH
  • forcing gpu placement in tensorflow script using with tf.device('/gpu:1'): (and with tf.device('/gpu:0'): when it failed, for good measure)
  • whitelisting the gpu I wanted to use with CUDA_VISIBLE_DEVICES, in case the presence of my old unsupported card did cause problems
  • running the script with sudo (because why not)

Here are the outputs of nvidia-smi and nvidia-debugdump -l, in case it's useful.

At this point, I feel like I have followed all the breadcrumbs and have no idea what I could try else. I'm not even sure if I'm contemplating a bug or a configuration problem. Any advice about how to debug this would be greatly appreciated. Thanks!

Update: with the help of Yaroslav on github, I gathered more debugging info by raising log level, but it doesn't seem to say much about the device selection : https://gist.github.com/oelmekki/760a37ca50bf58d4f03f46d104b798bb

Update 2: Using theano detects gpu correctly, but interestingly it complains about cuDNN being too recent, then fallback to cpu (code ran, output). Maybe that could be the problem with tensorflow as well?

Tensorflow Solutions


Solution 1 - Tensorflow

From the log output, it looks like you are running the CPU version of TensorFlow (PyPI: tensorflow), and not the GPU version (PyPI: tensorflow-gpu). Running the GPU version would either log information about the CUDA libraries, or an error if it failed to load them or open the driver.

If you run the following commands, you should be able to use the GPU in subsequent runs:

$ pip uninstall tensorflow
$ pip install tensorflow-gpu

Solution 2 - Tensorflow

None of the other answers here worked for me. After a bit of tinkering I found that this fixed my issues when dealing with Tensorflow built from binary:


Step 0: Uninstall protobuf

pip uninstall protobuf

Step 1: Uninstall tensorflow

pip uninstall tensorflow
pip uninstall tensorflow-gpu

Step 2: Force reinstall Tensorflow with GPU support

pip install --upgrade --force-reinstall tensorflow-gpu

Step 3: If you haven't already, set CUDA_VISIBLE_DEVICES

So for me with 2 GPUs it would be

export CUDA_VISIBLE_DEVICES=0,1

Solution 3 - Tensorflow

In my case:

pip3 uninstall tensorflow

is not enough. Because when reinstall with:

pip3 install tensorflow-gpu

It is still reinstall tensorflow with cpu not gpu. So, before install tensorflow-gpu, I tried to remove all related tensor folders in site-packages uninstall protobuf, and it works!

For conclusion:

pip3 uninstall tensorflow

Remove all tensor folders in ~\Python35\Lib\site-packages

pip3 uninstall protobuf
pip3 install tensorflow-gpu

Solution 4 - Tensorflow

Might seem dumb but a sudo reboot has fixed the exact same problem for me and a couple others.

Solution 5 - Tensorflow

The answer that saved my day came from Mark Sonn. Simply add this to .bashrc and source ~/.bashrc if you are on Linux:

export CUDA_VISIBLE_DEVICES=0,1

Previously I had to use this workaround to get tensorflow recognize my GPU:

import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices(device_type="GPU")
tf.config.experimental.set_visible_devices(devices=gpus[0], device_type="GPU")
tf.config.experimental.set_memory_growth(device=gpus[0], enable=True)

Even though the code still worked, adding these lines every time is clearly not something I would want. My version of tensorflow was built from source according to the documentation to get v2.3 support CUDA 10.2 and cudnn 7.6.5.

If anyone having trouble with that, I suggest doing a quick skim over the docs. Took 1.5 hours to build with bazel. Make sure you have gcc7 and bazel installed.

Solution 6 - Tensorflow

This error may be caused by your GPU's compute capability, CUDA officially supports GPU's compute capability within 3.5 ~ 5.0, you can check here: https://en.wikipedia.org/wiki/CUDA

In my case, the error was like this: >Ignoring visible gpu device (device: 0, name: GeForce GT 640M, pci bus id: 0000:01:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.

For now we can only compile from source code on Linux (or mac OS) to break the '3.5~5.0' limit.

Solution 7 - Tensorflow

There are various system incompatible problems.

The requirement for libraries can vary from the version of TensorFlow.

During using python in interactive mode a lot of useful information is printing into stderr. What I suggest for TensorFlow with version 2.0 or more to call:

> python3.8 -c "import tensorflow as tf; print('tf version:', tf.version); tf.config.list_physical_devices()"

After this command, you will observe missing libraries (or a version of it) for work with GPU in addition to requirements:

p.s. CUDA_VISIBLE_DEVICES should not have a real connection with TensorFlow, or it's more general - it's a way to customize available GPUs for all launched processes.

Solution 8 - Tensorflow

For anaconda users. I installed tensorflow-gpu via GUI using Anaconda Navigator and configured NVIDIA GPU as in tensorflow guide but tensorflow couldn't find the GPU anyway. Then I uninstalled tensorflow, always via GUI (see here) and reinstalled it via command line in an anaconda prompt issuing:

conda install -c anaconda tensorflow-gpu

and then tensorflow could find the GPU correctly.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionkikView Question on Stackoverflow
Solution 1 - TensorflowmrryView Answer on Stackoverflow
Solution 2 - TensorflowMark SonnView Answer on Stackoverflow
Solution 3 - Tensorflownguyenhoai890View Answer on Stackoverflow
Solution 4 - TensorflowKeerthana GopalakrishnanView Answer on Stackoverflow
Solution 5 - TensorflowHuyView Answer on Stackoverflow
Solution 6 - TensorflowSimurghView Answer on Stackoverflow
Solution 7 - TensorflowKonstantin BurlachenkoView Answer on Stackoverflow
Solution 8 - TensorflowAeliusView Answer on Stackoverflow