Comment by WithinReason

2 months ago

I made an explicit reverse pass (no autodiff), it was 8x faster in Python

tradeoff worth naming: you avoid the autodiff graph overhead (hence the speedup), but any architecture change means rewriting every gradient by hand. fine for a pedagogical project, but that's exactly why autodiff exists.