Comment by sdpmas

4 hours ago

yeah, we do incorporate some of the findings from the paper in our repo! like aggressive regularization and ensembling.

4 comments

sdpmas

I see you already mention diffusion - iirc there was a result not too long ago that diffusion models keep improving with more epochs for longer than AR models do.

sdpmas 2 hours ago
diffusion is promising, but still an open question how much data efficient they are compared to AR. in practice, you can also train AR forever with high enough regularization, so let's see.
- _0ffh 2 hours ago
  
  Yes, it could go either way of course.
  Still, just for reference, here's the paper I remembered: https://arxiv.org/pdf/2507.15857
  
  1 reply →