How to avoid "CUDA out of memory" in PyTorch

PythonDeep LearningPytorchObject DetectionLow Memory

Python Problem Overview


I think it's a pretty common message for PyTorch users with low GPU memory:

RuntimeError: CUDA out of memory. Tried to allocate 😊 MiB (GPU 😊; 😊 GiB total capacity; 😊 GiB already allocated; 😊 MiB free; 😊 cached)

I tried to process an image by loading each layer to GPU and then loading it back:

for m in self.children():
    m.cuda()
    x = m(x)
    m.cpu()
    torch.cuda.empty_cache()

But it doesn't seem to be very effective. I'm wondering is there any tips and tricks to train large deep learning models while using little GPU memory.

Python Solutions


Solution 1 - Python

Although

import torch
torch.cuda.empty_cache()

provides a good alternative for clearing the occupied cuda memory and we can also manually clear the not in use variables by using,

import gc
del variables
gc.collect()

But still after using these commands, the error might appear again because pytorch doesn't actually clears the memory instead clears the reference to the memory occupied by the variables. So reducing the batch_size after restarting the kernel and finding the optimum batch_size is the best possible option (but sometimes not a very feasible one).

Another way to get a deeper insight into the alloaction of memory in gpu is to use:

torch.cuda.memory_summary(device=None, abbreviated=False)

wherein, both the arguments are optional. This gives a readable summary of memory allocation and allows you to figure the reason of CUDA running out of memory and restart the kernel to avoid the error from happening again (Just like I did in my case).

Passing the data iteratively might help but changing the size of layers of your network or breaking them down would also prove effective (as sometimes the model also occupies a significant memory for example, while doing transfer learning).

Solution 2 - Python

Just reduce the batch size, and it will work. While I was training, it gave following error:

> CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.76 GiB > total capacity; 4.29 GiB already allocated; 10.12 MiB free; 4.46 GiB > reserved in total by PyTorch)

And I was using batch size of 32. So I just changed it to 15 and it worked for me.

Solution 3 - Python

Send the batches to CUDA iteratively, and make small batch sizes. Don't send all your data to CUDA at once in the beginning. Rather, do it as follows:

for e in range(epochs):
    for images, labels in train_loader:   
        if torch.cuda.is_available():
            images, labels = images.cuda(), labels.cuda()   
        # blablabla  

You can also use dtypes that use less memory. For instance, torch.float16 or torch.half.

Solution 4 - Python

Try not drag your grads too far.

I got the same error when I tried to sum up loss in all batches.

loss =  self.criterion(pred, label)

total_loss += loss

Then I use loss.item instead of loss which requires grads, then solved the problem

loss =  self.criterion(pred, label)

total_loss += loss.item()

The solution below is credited to yuval reina in the kaggle question

> This error is related to the GPU memory and not the general memory => @cjinny comment might not work.
Do you use TensorFlow/Keras or Pytorch?
Try using a smaller batch size.
If you use Keras, Try to decrease some of the hidden layer sizes.
If you use Pytorch:
do you keep all the training data on the GPU all the time?
make sure you don't drag the grads too far
check the sizes of you hidden layer

Solution 5 - Python

Most things are covered, still will add a little.

If torch gives error as "Tried to allocate 2 MiB" etc. it is a mis-leading message. Actually, CUDA runs out of total memory required to train the model. You can reduce the batch size. Say, even if batch size of 1 is not working (happens when you train NLP models with massive sequences), try to pass lesser data, this will help you confirm that your GPU does not have enough memory to train the model.

Also, Garbage collection and cleaning cache part has to be done again, if you want to re-train the model.

Solution 6 - Python

There are ways to avoid, but it certainly depends on your GPU memory size:

  1. Loading the data in GPU when unpacking the data iteratively,
features, labels in batch:
   features, labels = features.to(device), labels.to(device)
  1. Using FP_16 or single precision float dtypes.
  2. Try reducing the batch size if you ran out of memory.
  3. Use .detach() method to remove tensors from GPU which are not needed.

If all of the above are used properly, PyTorch library is already highly optimizer and efficient.

Solution 7 - Python

Follow these steps:

  1. Reduce train,val,test data
  2. Reduce batch size {eg. 16 or 32}
  3. Reduce number of model parameters {eg. less than million}

In my case, when I am training common voice dataset in kaggle kernels the same error raises. I delt with reducing training dataset to 20000,batch size to 16 and model parameter to 112K.

Solution 8 - Python

Implementation:

  1. Feed the image into gpu batch by batch.

  2. Using a small batch size during training or inference.

  3. Resize the input images with a small image size.

Technically:

  1. Most networks are over parameterized, which means they are too large for the learning tasks. So finding an appropriate network structure can help:

a. Compact your network with techniques like model compression, network pruning and quantization.

b. Directly using a more compact network structure like mobileNetv1/2/3.

c. Network architecture search(NAS).

Solution 9 - Python

I have the same error but fix it by resize my images from ~600 to 100 using the lines:

import torchvision.transforms as transforms
transform = transforms.Compose([
    transforms.Resize((100, 100)), 
    transforms.ToTensor()
])

Solution 10 - Python

Although this seems bizarre what I found is there are many sessions running in the background for collab even if we factory reset runtime or we close the tab. I conquered this by clicking on "Runtime" from the menu and then selecting "Manage Sessions". I terminated all the unwanted sessions and I was good to go.

Solution 11 - Python

I would recommend using mixed precision training with PyTorch. It can make training way faster and consume less memory.

Take a look at https://spell.ml/blog/mixed-precision-training-with-pytorch-Xuk7YBEAACAASJam.

Solution 12 - Python

There is now a pretty awesome library which makes this very simple: https://github.com/rentruewang/koila

pip install koila

in your code, simply wrap the input with lazy:

from koila import lazy
input = lazy(input, batch=0)

Solution 13 - Python

As long as you don't cross a batch size of 32, you will be fine. Just remember to refresh or restart runtime or else even if you reduce the batch size, you will encounter the same error. I set my batch size to 16, it reduces zero gradients from occurring during my training and the model matches the true function much better. Rather than using a batch size of 4 or 8 which causes the training loss to fluctuate than

Solution 14 - Python

I meet the same error, and my GPU is GTX1650 with 4g video memory and 16G ram. It worked for me when I reduce the batch_size to 3. Hope this can help you

Solution 15 - Python

Best way would be lowering down the batch size. Usually it works. Otherwise try this:

import gc

del variable #delete unnecessary variables 
gc.collect()

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionvoilalexView Question on Stackoverflow
Solution 1 - PythonSHAGUN SHARMAView Answer on Stackoverflow
Solution 2 - PythonRahulView Answer on Stackoverflow
Solution 3 - PythonNicolas GervaisView Answer on Stackoverflow
Solution 4 - Pythonpandas007View Answer on Stackoverflow
Solution 5 - PythonYoungSheldonView Answer on Stackoverflow
Solution 6 - PythonNivesh GadipudiView Answer on Stackoverflow
Solution 7 - PythonKavi ArasanView Answer on Stackoverflow
Solution 8 - PythondavidView Answer on Stackoverflow
Solution 9 - PythonRamy AbdAllahView Answer on Stackoverflow
Solution 10 - PythonZubair KhaliqView Answer on Stackoverflow
Solution 11 - PythonKarolView Answer on Stackoverflow
Solution 12 - PythonDreamFlasherView Answer on Stackoverflow
Solution 13 - PythonKeston SmithView Answer on Stackoverflow
Solution 14 - Pythonsmith andyView Answer on Stackoverflow
Solution 15 - PythonHarshad PatilView Answer on Stackoverflow