Comment by kseniamorph
4 hours ago
Curious about the baseline choice. modded-nanogpt was optimized for wall-clock speed, not data efficiency, so it seems like an unusual reference point for this kind of benchmark. Why not vanilla NanoGPT?
4 hours ago
Curious about the baseline choice. modded-nanogpt was optimized for wall-clock speed, not data efficiency, so it seems like an unusual reference point for this kind of benchmark. Why not vanilla NanoGPT?
Modded-nanogpt is also much more data efficient than vanilla napogpt, even if some of the individual optimizations trade off higher throughput for worse data efficiency.
yes, agreed, modded-nanogpt is already a data-efficient variant of original nanogpt. just that the kinds of algorithms it allows are somewhat constrained because it optimizes for wall clock time.