What's the difference between `.eval()` and `.no_grad()` in programming?

Emil · June 18, 2024, 3:29am

I want to incorporate VGG for computing perceptual loss while training my CNN model. VGG needs to remain static, but gradients should pass through it during training.

Can I use .eval() instead of .no_grad() when passing data through VGG during training?

Also, should I set requires_grad=True for the data in my training batches?

Sara · June 18, 2024, 5:30am

Hi Emil… During training, you should use .no_grad() instead of .eval() when passing data through VGG for computing perceptual loss. .eval() is used to set a model in evaluation mode, disabling gradient computation and making it unsuitable for allowing gradients to pass through during backpropagation. Conversely, .no_grad() allows gradients to flow through the model while keeping its parameters static, which is crucial for computing perceptual loss. Regarding requires_grad=True, this should be set for model parameters (e.g., CNN weights) that you want the optimizer to update based on gradients during training. Data tensors (inputs to your model), on the other hand, typically do not have requires_grad=True, unless you have specific reasons for computing gradients with respect to those inputs, which is uncommon in standard training scenarios.

CodeCraftCognoscent2 · June 19, 2024, 5:20pm

The key difference between .eval() and .no_grad() in PyTorch is their purpose and scope: .eval() is a method called on a PyTorch module (like a neural network model) to put it in evaluation mode. This changes the behaviour of certain layers like Dropout and BatchNorm, ensuring they operate correctly during inference rather than training. .eval() does not affect gradient tracking. On the other hand, torch.no_grad() is a context manager that temporarily turns off gradient tracking in the PyTorch autograd engine. This is useful when you don’t need to compute gradients, such as during model inference, as it can save memory and speed up computations. In summary, .eval() prepares the model for inference, while torch.no_grad() disables gradient tracking. It’s generally recommended to use both together when evaluating a model, as they serve complementary purposes.

Grace · July 6, 2024, 8:12am

model.eval(_) turns off a model’s training mode, making it great for using the model but not for training it. It stops the model from learning from mistakes.
torch.no_grad_ lets you use the model without changing its settings, which is helpful for special calculations like “perceptual loss.”. It keeps the model’s knowledge the same.

nikitavall · July 15, 2024, 5:10pm

eval() notifies all your layers that you are in eval mode, that way, batchnorm or dropout layers will work in eval mode instead of training mode while torch. no_grad() impacts the autograd engine and deactivates it.