Comment by maxbond

4 hours ago

Reminds me of the recent paper about delegating document editing tasks to LLMs across different disciplines [1]. That paper found that programming was the only discipline most LLMs can perform long horizon tasks on without accumulating errors & corrupting the document.

I've only read the abstract of this one so far but it seems like this paper has zoomed in on programming with greater fidelity and shown a similar phenomenon. But not about long horizon tasks, more like "long style horizons" of larger sets of structural constraints.

[1] https://news.ycombinator.com/item?id=48073246

5 comments

maxbond

emp17344 3 hours ago

If it’s not easily verifiable, LLMs aren’t good at it.

jeremyjh 2 hours ago
I think that’s mostly because they get so much more of that reinforcement learning - since it is so economical. I dont know if there is any evidence of a fundamental reason they can’t be just as good at other tasks, but it might be economically infeasible for awhile yet.
- emp17344 3 minutes ago
  
  RLVR doesn’t work for unverifiable tasks, so they won’t be able to effectively use tools to boost reliability for those tasks.
- mjburgess 2 hours ago
  
  No one is curating vast amounts of data for them in other domains. Programmers send programs with fixes
  
  1 reply →