Tensorflow Allocation Memory: Allocation of 38535168 exceeds 10% of system memory

PythonTensorflowMemoryKeras LayerResnet

Python Problem Overview


Using ResNet50 pre-trained Weights I am trying to build a classifier. The code base is fully implemented in Keras high-level Tensorflow API. The complete code is posted in the below GitHub Link.

Source Code: Classification Using RestNet50 Architecture

The file size of the pre-trained model is 94.7mb.

I loaded the pre-trained file

new_model = Sequential()

new_model.add(ResNet50(include_top=False,
                pooling='avg',
                weights=resnet_weight_paths))

and fit the model

train_generator = data_generator.flow_from_directory(
    'path_to_the_training_set',
    target_size = (IMG_SIZE,IMG_SIZE),
    batch_size = 12,
    class_mode = 'categorical'
    )

validation_generator = data_generator.flow_from_directory(
    'path_to_the_validation_set',
    target_size = (IMG_SIZE,IMG_SIZE),
    class_mode = 'categorical'
    )

#compile the model

new_model.fit_generator(
    train_generator,
    steps_per_epoch = 3,
    validation_data = validation_generator,
    validation_steps = 1
)

and in the Training dataset, I have two folders dog and cat, each holder almost 10,000 images. When I compiled the script, I get the following error

> Epoch 1/1 2018-05-12 13:04:45.847298: W > tensorflow/core/framework/allocator.cc:101] Allocation of 38535168 > exceeds 10% of system memory. 2018-05-12 13:04:46.845021: W > tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 > exceeds 10% of system memory. 2018-05-12 13:04:47.552176: W > tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 > exceeds 10% of system memory. 2018-05-12 13:04:48.199240: W > tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 > exceeds 10% of system memory. 2018-05-12 13:04:48.918930: W > tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 > exceeds 10% of system memory. 2018-05-12 13:04:49.274137: W > tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 > exceeds 10% of system memory. 2018-05-12 13:04:49.647061: W > tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 > exceeds 10% of system memory. 2018-05-12 13:04:50.028839: W > tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 > exceeds 10% of system memory. 2018-05-12 13:04:50.413735: W > tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 > exceeds 10% of system memory.

Any ideas to optimize the way to load the pre-trained model (or) get rid of this warning message?

Thanks!

Python Solutions


Solution 1 - Python

Try reducing batch_size attribute to a small number(like 1,2 or 3). Example:

train_generator = data_generator.flow_from_directory(
    'path_to_the_training_set',
    target_size = (IMG_SIZE,IMG_SIZE),
    batch_size = 2,
    class_mode = 'categorical'
    )

Solution 2 - Python

I was having the same problem while running Tensorflow container with Docker and Jupyter notebook. I was able to fix this problem by increasing the container memory.

On Mac OS, you can easily do this from:

       Docker Icon > Preferences >  Advanced > Memory

Drag the scrollbar to maximum (e.g. 4GB). Apply and it will restart the Docker engine.

Now run your tensor flow container again.

It was handy to use the docker stats command in a separate terminal It shows the container memory usage in realtime, and you can see how much memory consumption is growing:

CONTAINER ID   NAME   CPU %   MEM USAGE / LIMIT     MEM %    NET I/O             BLOCK I/O           PIDS
3170c0b402cc   mytf   0.04%   588.6MiB / 3.855GiB   14.91%   13.1MB / 3.06MB     214MB / 3.13MB      21

Solution 3 - Python

Alternatively, you can set the environment variable TF_CPP_MIN_LOG_LEVEL=2 to filter out info and warning messages. I found that on this github issue where they complain about the same output. To do so within python, you can use the solution from here:

import os
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

You can even turn it on and off at will with this. I test for the maximum possible batch size before running my code, and I can disable warnings and errors while doing this.

Solution 4 - Python

I was running a small model on a CPU and had the same issue. Adding:os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' resolved it.

Solution 5 - Python

I was having the same problem, and i concluded that there are two factors to be considered when see this error: 1- batch_size ==> because this responsible for the data size to be processed for each epoch 2- image_size ==> the higher image dimensions (image size), more data to be processed

So for these two factors, the RAM cannot handle all of required data.

To solve the problem I tried two cases: The first change batch_size form 32 to 3 or 2 The second reduce image_size from (608,608) to (416,416)

Solution 6 - Python

I was facing the same issue while running code on Linux platform. I changed the swap memory size which was set to 0 previously to 1 GB and issue got fixed.

For further details you can consult to this link.

https://linuxize.com/post/create-a-linux-swap-file/

Solution 7 - Python

I have had a similar issue but reducing the batch size for me did not solve it (My code actually ran fine when the batch value was 16). I started working on a model under it that involved scanning pictures that's when I had issues about memory. In the model, there is a Dense value, from the Keras package that i had initially set to 4096. I divided this value by 2 and the whole code stop showing the memory error. Moreover, reducing the value of image resolution would fix the problem too but my pictures were already in low resolution (100X100)

Solution 8 - Python

I was getting the same error and i tried setting os.environment flag...but it didn't work out.

Then i went ahead and reduced my batch size from 16 to 8, and it started to work fine after then. As it is because, the train method takes into account the batch size...i feel, reducing the image size would also work..as mentioned above as well.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMadhiView Question on Stackoverflow
Solution 1 - Pythonjaba_yView Answer on Stackoverflow
Solution 2 - PythonrevolutionaryView Answer on Stackoverflow
Solution 3 - PythonPoikView Answer on Stackoverflow
Solution 4 - PythonAhmed J.View Answer on Stackoverflow
Solution 5 - PythonAhmad AlomariView Answer on Stackoverflow
Solution 6 - PythonAhmed SharifView Answer on Stackoverflow
Solution 7 - PythonOkwiziView Answer on Stackoverflow
Solution 8 - Pythonsarjit07View Answer on Stackoverflow