Comment by atleastoptimal
2 months ago
They should do a 95% and 99% version of the graphs, otherwise it's hard to ascertain whether the failure cases will remain in the elusive "stuff humans can do easily but LLM's trip up despite scaling"
2 months ago
They should do a 95% and 99% version of the graphs, otherwise it's hard to ascertain whether the failure cases will remain in the elusive "stuff humans can do easily but LLM's trip up despite scaling"
No comments yet
Contribute on Hacker News ↗