Evaluating pytorch models: `with torch.no_grad` vs `model.eval()`

PythonMachine LearningDeep LearningPytorchAutograd

Python Problem Overview


When I want to evaluate the performance of my model on the validation set, is it preferred to use with torch.no_grad: or model.eval()?

Python Solutions


Solution 1 - Python

TL;DR:

Use both. They do different things, and have different scopes.

  • with torch.no_grad - disables tracking of gradients in autograd.
  • model.eval() changes the forward() behaviour of the module it is called upon
  • eg, it disables dropout and has batch norm use the entire population statistics

with torch.no_grad

The torch.autograd.no_grad documentation says:

> Context-manager that disabled [sic] gradient calculation.

> Disabling gradient calculation is useful for inference, when you are sure that you will not call Tensor.backward(). It will reduce memory consumption for computations that would otherwise have requires_grad=True. In this mode, the result of every computation will have requires_grad=False, even when the inputs have requires_grad=True.

model.eval()

The nn.Module.eval documentation says:

> Sets the module in evaluation mode.

> This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.


The creator of pytorch said the documentation should be updated to suggest the usage of both, and I raised the pull request.

Solution 2 - Python

If you're reading this post because you've been encountering RuntimeError: CUDA out of memory, then with torch.no grad(): will likely to help save the memory. Using only model.eval() is unlikely to help with the OOM error.

The reason for this is that torch.no grad() disables autograd completely (you can no longer backpropagate), reducing memory consumption and speeding up computations.

However, you will still be able to call the gardients when using model.eval(). Personally, I find this design decision intriguing. So, what is the purpose of .eval()? It seems its main functionality is to deactivate the Dropout during the evaluation time.

To summarize, if you use torch.no grad(), no intermediate tensors are saved, and you can possibly increase the batch size in your inference.

Solution 3 - Python

with torch.no_grad: disables computation of gradients for the backward pass. Since these calculations are unnecessary during inference, and add non-trivial computational overhead, it is essessential to use this context if evaluating the model's speed. It will not however affect results.

model.eval() ensures certain modules which behave differently in training vs inference (e.g. Dropout and BatchNorm) are defined appropriately during the forward pass in inference. As such, if your model contains such modules it is essential to enable this.

For the reasons above it is good practice to use both during inference.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTom HaleView Question on Stackoverflow
Solution 1 - PythonTom HaleView Answer on Stackoverflow
Solution 2 - PythonaerinView Answer on Stackoverflow
Solution 3 - PythoniacobView Answer on Stackoverflow