To make sure my custom function has low time complexity, I want to write a function that will replace torch.nn.functional.log_softmax in my loss_function.

In place of torch.nn.functional.log_softmax, I use tensor.exp, tensor.log, tensor.sum, and other functions. I have already tried both the raw version and my new version, and they both give the same result for basic and simple tensor computations. But when I moved the new code to the model training process, I saw that the loss of the new code couldnâ€™t be cut down and there was still a lot of change.

I need to know

The question is whether backpropagation works if I use tensor.exp in the model and not if I use torch.nn.functional.function.

If I use torch.function, does it automatically backpropagate like tensor.exp()?

I know this question is probably silly. Thanks for your help.

When log and exp are used with big logits, the numbers are very uncertain. Most of the time, softmax implementations take max(logits) away from all logits before applying exps and sums. Taking away the same number from all logits doesnâ€™t change the softmax answer, but it does make the calculation more stable. You could try out your version on logits that are bigger.

If you try your code on logits with sizes that are similar to those you saw during training, you can see if the issue is with numerical stability. You could also just take away the max-logit from your code and test to see if that fixes the training problem.

To answer your title question, the torch.nn.functional functions are imported in the top level torch module. Theyâ€™re literally the same. As the other comment points out, the differences youâ€™re seeing are probably because youâ€™re not using the same numerical stability tricks.

To address this, you can:
Verify the implementations
Check the numerical stability techniques
Test with different inputs
Experiment with different numerical stability tricks