Comment by yorwba
1 day ago
> We also found interestingly that:
torch.exp(q - q.detach()) * advantages.unsqueeze(1)
> is used, which should be evaluated to 1 right? We actually found this is necessary - it seems that the autograd engine might not be propagating gradients correctly.
The autograd engine is propagating gradients correctly, but the question is, which gradients?
You could encapsulate this as a function
f = lambda a, b: torch.exp(a - b) * advantages.unsqueeze(1)
then have f_a(a, b) be the derivative of that with respect to a, and substitute in q for both variables to get f_a(q, q).
But if you substitute to get f(q, q) first and then differentiate with respect to q, you don't get f_a(q, q), but instead f_a(q, q) + f_b(q, q), which in this case would be 0. The ordering of variable substitution and differentiation cannot be exchanged freely.
detach() is a way to say "we want to differentiate the expression first, treating this as a constant, and then substitute with this variable afterwards."
No comments yet
Contribute on Hacker News ↗