Comment by bravura

4 months ago

The measures that drop exponentially like val/bpb and train/loss you should put the x-axis in log-scale. That will better show you if it's converged

2 comments

bravura

Great call, thankyou - I switched to log scale for those metrics - agree that it is much clearer.

bravura 4 months ago

Sorry fat fingers. It should be the y axis that is log scale, not x axis. (Sometimes both is good.)
Did you notice the inflection point in which the loss drops faster than expected in the top graph? Maybe you should let it run more…