← Back to context

Comment by freehorse

9 hours ago

> as long as n isn't zero

Which is the case with softmax function, as for T=0 you end up with a fraction that either becomes 0/0 or inf/inf [0]. So you do need branching as floating point arithmetic is not gonna get you there.

[0] except for weights that are exactly 0

edit: thinking more about it, one could always express the softmax formula in ways that this could work with floating point arithmetic but it would be very inefficient and sort of pointless