Comment by zozbot234
12 hours ago
I don't think anyone knows for sure how much mileage/scalability LLMs have. Given what we do know, I suspect if you can afford to spend more compute on even longer training runs, you can still get much better results compared to SOTA, even for "simple" domains like text/language.
I think we're pretty much out of "spend more compute on even longer training runs" atp.