Comment by riku_iki

5 months ago

One concern is that coding/logical puzzles are verticals where LLMs have lots of training data, they require small context window, and that's why they are doing well, but they don't necessary scale/generalize on other topics. For example I yet to see agents which would grab say Postgres codebase from github, and add untrivial features, send patch which is accepted.

0 comments