Clearing Tensorflow GPU memory after model execution

PythonTensorflowGpu

Python Problem Overview


I've trained 3 models and am now running code that loads each of the 3 checkpoints in sequence and runs predictions using them. I'm using the GPU.

When the first model is loaded it pre-allocates the entire GPU memory (which I want for working through the first batch of data). But it doesn't unload memory when it's finished. When the second model is loaded, using both tf.reset_default_graph() and with tf.Graph().as_default() the GPU memory still is fully consumed from the first model, and the second model is then starved of memory.

Is there a way to resolve this, other than using Python subprocesses or multiprocessing to work around the problem (the only solution I've found on via google searches)?

Python Solutions


Solution 1 - Python

A git issue from June 2016 (https://github.com/tensorflow/tensorflow/issues/1727) indicates that there is the following problem:

> currently the Allocator in the GPUDevice belongs to the ProcessState, > which is essentially a global singleton. The first session using GPU > initializes it, and frees itself when the process shuts down.

Thus the only workaround would be to use processes and shut them down after the computation.

Example Code:

import tensorflow as tf
import multiprocessing
import numpy as np

def run_tensorflow():

    n_input = 10000
    n_classes = 1000

    # Create model
    def multilayer_perceptron(x, weight):
        # Hidden layer with RELU activation
        layer_1 = tf.matmul(x, weight)
        return layer_1

    # Store layers weight & bias
    weights = tf.Variable(tf.random_normal([n_input, n_classes]))


    x = tf.placeholder("float", [None, n_input])
    y = tf.placeholder("float", [None, n_classes])
    pred = multilayer_perceptron(x, weights)

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
    optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)

    init = tf.global_variables_initializer()

    with tf.Session() as sess:
        sess.run(init)

        for i in range(100):
            batch_x = np.random.rand(10, 10000)
            batch_y = np.random.rand(10, 1000)
            sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
        
    print "finished doing stuff with tensorflow!"
    
    
if __name__ == "__main__":

    # option 1: execute code with extra process
    p = multiprocessing.Process(target=run_tensorflow)
    p.start()
    p.join()

    # wait until user presses enter key
    raw_input()

    # option 2: just execute the function
    run_tensorflow()

    # wait until user presses enter key
    raw_input()

So if you would call the function run_tensorflow() within a process you created and shut the process down (option 1), the memory is freed. If you just run run_tensorflow() (option 2) the memory is not freed after the function call.

Solution 2 - Python

You can use numba library to release all the gpu memory

pip install numba 
from numba import cuda 
device = cuda.get_current_device()
device.reset()

This will release all the memory

Solution 3 - Python

I use numba to release GPU. With TensorFlow, I cannot find an effective method.

import tensorflow as tf
from numba import cuda

a = tf.constant([1.0,2.0,3.0],shape=[3],name='a')
b = tf.constant([1.0,2.0,3.0],shape=[3],name='b')
with tf.device('/gpu:1'):
    c = a+b

TF_CONFIG = tf.ConfigProto(
gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.1),
  allow_soft_placement=True)

sess = tf.Session(config=TF_CONFIG)
sess.run(tf.global_variables_initializer())
i=1
while(i<1000):
        i=i+1
        print(sess.run(c))

sess.close() # if don't use numba,the gpu can't be released
cuda.select_device(1)
cuda.close()
with tf.device('/gpu:1'):
    c = a+b

TF_CONFIG = tf.ConfigProto(
gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.5),
  allow_soft_placement=True)

sess = tf.Session(config=TF_CONFIG)

sess.run(tf.global_variables_initializer())
while(1):
        print(sess.run(c))

Solution 4 - Python

Now there seem to be two ways to resolve the iterative training model or if you use future multipleprocess pool to serve the model training, where the process in the pool will not be killed if the future finished. You can apply two methods in the training process to release GPU memory meanwhile you wish to preserve the main process.

  1. call a subprocess to run the model training. when one phase training completed, the subprocess will exit and free memory. It's easy to get the return value.
  2. call the multiprocessing.Process(p) to run the model training(p.start), and p.join will indicate the process exit and free memory.

Here is a helper function using multiprocess.Process which can open a new process to run your python written function and reture value instead of using Subprocess,

# open a new process to run function
def process_run(func, *args):
    def wrapper_func(queue, *args):
        try:
            logger.info('run with process id: {}'.format(os.getpid()))
            result = func(*args)
            error = None
        except Exception:
            result = None
            ex_type, ex_value, tb = sys.exc_info()
            error = ex_type, ex_value,''.join(traceback.format_tb(tb))
        queue.put((result, error))

    def process(*args):
        queue = Queue()
        p = Process(target = wrapper_func, args = [queue] + list(args))
        p.start()
        result, error = queue.get()
        p.join()
        return result, error  

    result, error = process(*args)
    return result, error

Solution 5 - Python

I am figuring out which option is better in the Jupyter Notebook. Jupyter Notebook occupies the GPU memory permanently even a deep learning application is completed. It usually incurs the GPU Fan ERROR that is a big headache. In this condition, I have to reset nvidia_uvm and reboot the linux system regularly. I conclude the following two options can remove the headache of GPU Fan Error but want to know which is better.

Environment:

  • CUDA 11.0
  • cuDNN 8.0.1
  • TensorFlow 2.2
  • Keras 2.4.3
  • Jupyter Notebook 6.0.3
  • Miniconda 4.8.3
  • Ubuntu 18.04 LTS

First Option

Put the following code at the end of the cell. The kernel immediately ended upon the application runtime is completed. But it is not much elegant. Juputer will pop up a message for the died ended kernel.

import os
 
pid = os.getpid()
!kill -9 $pid

Section Option

The following code can also end the kernel with Jupyter Notebook. I do not know whether numba is secure. Nvidia prefers the "0" GPU that is the most used GPU by personal developer (not server guys). However, both Neil G and mradul dubey have had the response: This leaves the GPU in a bad state.

from numba import cuda

cuda.select_device(0)
cuda.close()

It seems that the second option is more elegant. Can some one confirm which is the best choice?

Notes:

It is not such the problem to automatically release the GPU memory in the environment of Anaconda by direct executing "$ python abc.py". However, I sometimes need to use Jyputer Notebook to handle .ipynb application.

Solution 6 - Python

GPU memory allocated by tensors is released (back into TensorFlow memory pool) as soon as the tensor is not needed anymore (before the .run call terminates). GPU memory allocated for variables is released when variable containers are destroyed. In case of DirectSession (ie, sess=tf.Session("")) it is when session is closed or explicitly reset (added in 62c159ff)

Solution 7 - Python

I have trained my models in a for loop for different parameters when I got this error after 120 models trained. Afterwards I could not even train a simple model if I did not kill the kernel. I was able to solve my issue by adding the following line before building the model:

tf.keras.backend.clear_session()

(see https://www.tensorflow.org/api_docs/python/tf/keras/backend/clear_session)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDavid ParksView Question on Stackoverflow
Solution 1 - PythonOliver WilkenView Answer on Stackoverflow
Solution 2 - Pythonhitesh kumarView Answer on Stackoverflow
Solution 3 - PythonTanLingxiaoView Answer on Stackoverflow
Solution 4 - PythonliviaerxinView Answer on Stackoverflow
Solution 5 - PythonMike ChenView Answer on Stackoverflow
Solution 6 - PythonYaroslav BulatovView Answer on Stackoverflow
Solution 7 - PythonLingView Answer on Stackoverflow