Tensorflow crashes with CUBLAS_STATUS_ALLOC_FAILED

Tensorflow Problem Overview

I'm running tensorflow-gpu on Windows 10 using a simple MINST neural network program. When it tries to run, it encounters a CUBLAS_STATUS_ALLOC_FAILED error. A google search doesn't turn up anything.

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:0f:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:0f:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _do_call
    return fn(*args)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1003, in _run_fn
    status, run_metadata)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 256), m=100, n=256, k=784
         [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_7, Variable/read)]]
         [[Node: Mean/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_35_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Tensorflow Solutions

Solution 1 - Tensorflow

For TensorFlow 2.2 none of the other answers worked when the CUBLAS_STATUS_ALLOC_FAILED problem was encountered. Found a solution on https://www.tensorflow.org/guide/gpu:

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

I ran this code before any further calculations are made and found that the same code that produced CUBLAS error before now worked in same session. The sample code above is a specific example that sets the memory growth across a number of physical GPUs but it also solves the memory expansion problem.

Solution 2 - Tensorflow

The location of the "allow_growth" property of the session config seems to be different now. It's explained here: https://www.tensorflow.org/tutorials/using_gpu

So currently you'd have to set it like this:

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

Solution 3 - Tensorflow

tensorflow>=2.0

import tensorflow as tf
config = tf.compat.v1.ConfigProto(gpu_options = 
                         tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.8)
# device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(session)

Solution 4 - Tensorflow

I found this solution works

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session

config = tf.ConfigProto(
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
    # device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)

Solution 5 - Tensorflow

On windows, currently tensorflow does not allocate all available memory like it says in the documentation, instead you can work around this error by allowing dynamic memory growth as follows:

tf.Session(config=tf.ConfigProto(allow_growth=True))

Solution 6 - Tensorflow

None of these fixes worked for me, as it seems that the structure of the tensorflow libraries have changed. For Tensorflow 2.0, the only fix that worked for me was as under Limiting GPU memory growth on this page https://www.tensorflow.org/guide/gpu

For completeness and future-proofing, here's the solution from the docs - I imagine changing memory_limit may be necessary for some people - 1 GB was fine for my case.

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

Solution 7 - Tensorflow

Tensorflow 2.0 alpha

Allowing GPU memory growth may fix this issue. For Tensorflow 2.0 alpha / nightly there are two methods you can try, to archive this.

1.)

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth()

2.)

import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.4) # adjust this to the % of VRAM you 
                                                   # want to give to tensorflow.

I suggest you try both, and see if it helps. Source: https://www.tensorflow.org/alpha/guide/using_gpu

Solution 8 - Tensorflow

for keras:

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)

Solution 9 - Tensorflow

In my case, a stale python process was consuming memory. I killed it through task manager, and things are back to normal.

Solution 10 - Tensorflow

A bit late to the party but this resolves my issue with tensorflow 2.4.0 and a gtx 980ti. Before limiting the memory I got an error like:

CUBLAS_STATUS_ALLOC_FAILED

My solution was this piece of code:

import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])

I found the solution here: https://www.tensorflow.org/guide/gpu

Content Type	Original Author	Original Content on Stackoverflow
Question	Axiverse	View Question on Stackoverflow
Solution 1 - Tensorflow	Snympi	View Answer on Stackoverflow
Solution 2 - Tensorflow	Rafal Zajac	View Answer on Stackoverflow
Solution 3 - Tensorflow	Welcome_back	View Answer on Stackoverflow
Solution 4 - Tensorflow	Space Bear	View Answer on Stackoverflow
Solution 5 - Tensorflow	Axiverse	View Answer on Stackoverflow
Solution 6 - Tensorflow	carthurs	View Answer on Stackoverflow
Solution 7 - Tensorflow	kett	View Answer on Stackoverflow
Solution 8 - Tensorflow	Maverick Meerkat	View Answer on Stackoverflow
Solution 9 - Tensorflow	winterlight	View Answer on Stackoverflow
Solution 10 - Tensorflow	Anxifer	View Answer on Stackoverflow