tensorflow on GPU: no known devices, despite cuda's deviceQuery returning a "PASS" result
TensorflowTensorflow Problem Overview
> Note : this question was initially asked on github, but it was asked to be here instead
I'm having trouble running tensorflow on gpu, and it does not seems to be the usual cuda's configuration problem, because everything seems to indicate cuda is properly setup.
The main symptom: when running tensorflow, my gpu is not detected (the code being run, and its output).
What differs from usual issues is that cuda seems properly installed and running ./deviceQuery
from cuda samples is successful (output).
I have two graphical cards:
- an old GTX 650 used for my monitors (I don't want to use that one with tensorflow)
- a GTX 1060 that I want to dedicate to tensorflow
I use:
- tensorflow-1.0.0
- cuda-8.0 (ls -l /usr/local/cuda/lib64/libcud*)
- cudnn-5.1.10
- python-2.7.12
- nvidia-drivers-375.26 (this was installed by cuda and replaced my distro driver package)
I've tried:
- adding
/usr/local/cuda/bin/
to$PATH
- forcing gpu placement in tensorflow script using
with tf.device('/gpu:1'):
(andwith tf.device('/gpu:0'):
when it failed, for good measure) - whitelisting the gpu I wanted to use with
CUDA_VISIBLE_DEVICES
, in case the presence of my old unsupported card did cause problems - running the script with sudo (because why not)
Here are the outputs of nvidia-smi and nvidia-debugdump -l, in case it's useful.
At this point, I feel like I have followed all the breadcrumbs and have no idea what I could try else. I'm not even sure if I'm contemplating a bug or a configuration problem. Any advice about how to debug this would be greatly appreciated. Thanks!
Update: with the help of Yaroslav on github, I gathered more debugging info by raising log level, but it doesn't seem to say much about the device selection : https://gist.github.com/oelmekki/760a37ca50bf58d4f03f46d104b798bb
Update 2: Using theano detects gpu correctly, but interestingly it complains about cuDNN being too recent, then fallback to cpu (code ran, output). Maybe that could be the problem with tensorflow as well?
Tensorflow Solutions
Solution 1 - Tensorflow
From the log output, it looks like you are running the CPU version of TensorFlow (PyPI: tensorflow
), and not the GPU version (PyPI: tensorflow-gpu
). Running the GPU version would either log information about the CUDA libraries, or an error if it failed to load them or open the driver.
If you run the following commands, you should be able to use the GPU in subsequent runs:
$ pip uninstall tensorflow
$ pip install tensorflow-gpu
Solution 2 - Tensorflow
None of the other answers here worked for me. After a bit of tinkering I found that this fixed my issues when dealing with Tensorflow built from binary:
Step 0: Uninstall protobuf
pip uninstall protobuf
Step 1: Uninstall tensorflow
pip uninstall tensorflow
pip uninstall tensorflow-gpu
Step 2: Force reinstall Tensorflow with GPU support
pip install --upgrade --force-reinstall tensorflow-gpu
Step 3: If you haven't already, set CUDA_VISIBLE_DEVICES
So for me with 2 GPUs it would be
export CUDA_VISIBLE_DEVICES=0,1
Solution 3 - Tensorflow
In my case:
pip3 uninstall tensorflow
is not enough. Because when reinstall with:
pip3 install tensorflow-gpu
It is still reinstall tensorflow with cpu not gpu. So, before install tensorflow-gpu, I tried to remove all related tensor folders in site-packages uninstall protobuf, and it works!
For conclusion:
pip3 uninstall tensorflow
Remove all tensor folders in ~\Python35\Lib\site-packages
pip3 uninstall protobuf
pip3 install tensorflow-gpu
Solution 4 - Tensorflow
Might seem dumb but a sudo reboot
has fixed the exact same problem for me and a couple others.
Solution 5 - Tensorflow
The answer that saved my day came from Mark Sonn. Simply add this to .bashrc
and
source ~/.bashrc
if you are on Linux:
export CUDA_VISIBLE_DEVICES=0,1
Previously I had to use this workaround to get tensorflow recognize my GPU:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices(device_type="GPU")
tf.config.experimental.set_visible_devices(devices=gpus[0], device_type="GPU")
tf.config.experimental.set_memory_growth(device=gpus[0], enable=True)
Even though the code still worked, adding these lines every time is clearly not something I would want.
My version of tensorflow
was built from source according to the documentation to get v2.3 support CUDA 10.2 and cudnn 7.6.5.
If anyone having trouble with that, I suggest doing a quick skim over the docs. Took 1.5 hours to build with bazel. Make sure you have gcc7 and bazel installed.
Solution 6 - Tensorflow
This error may be caused by your GPU's compute capability, CUDA officially supports GPU's compute capability within 3.5 ~ 5.0, you can check here: https://en.wikipedia.org/wiki/CUDA
In my case, the error was like this: >Ignoring visible gpu device (device: 0, name: GeForce GT 640M, pci bus id: 0000:01:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.
For now we can only compile from source code on Linux (or mac OS) to break the '3.5~5.0' limit.
Solution 7 - Tensorflow
There are various system incompatible problems.
The requirement for libraries can vary from the version of TensorFlow.
During using python in interactive mode a lot of useful information is printing into stderr. What I suggest for TensorFlow with version 2.0 or more to call:
> python3.8 -c "import tensorflow as tf; print('tf version:', tf.version); tf.config.list_physical_devices()"
After this command, you will observe missing libraries (or a version of it) for work with GPU in addition to requirements:
- https://www.tensorflow.org/install/gpu#software_requirements
- https://www.tensorflow.org/install/gpu#hardware_requirements
p.s. CUDA_VISIBLE_DEVICES should not have a real connection with TensorFlow, or it's more general - it's a way to customize available GPUs for all launched processes.
Solution 8 - Tensorflow
For anaconda users. I installed tensorflow-gpu
via GUI using Anaconda Navigator and configured NVIDIA GPU as in tensorflow guide but tensorflow couldn't find the GPU anyway. Then I uninstalled tensorflow, always via GUI (see here) and reinstalled it via command line in an anaconda prompt issuing:
conda install -c anaconda tensorflow-gpu
and then tensorflow could find the GPU correctly.