← Back to context

Comment by zeroonetwothree

5 days ago

And yet when working on production code current LLMs are about as good as a poor intern. Not sure why the disconnect.

Depends. I’ve been using it for some of my workflows and I’d say it is more like a solid junior developer with weird quirks where it makes stupid mistakes and other times behaves as a 30 year SME vet.

  • I really doubt it's like a "solid junior developer". If it could do the work of a solid junior developer it would be making programming projects 10-100x faster because it can do things several times faster than a person can. Maybe it can write solid code for certain tasks but that's not the same thing as being a junior developer.

    • It can be 10-100x faster for some tasks already. I've had it build prototypes in minutes that would have taken me a few hours to cobble together, especially in domains and using libraries I don't have experience with.

It’s the same reason leet code is a bad interview question. Being good at these sorts of problems doesn’t translate directly to being good at writing production code.

because competitive coding is narrow well described domain(limited number of concepts: lists, trees, etc) with high volume of data available for training, and easy way to setup RL feeback loop, so models can improve well in this domain, which is not true about typical enterprise overbloated software.

  • All you said is true. Keep in mind this is the "Heuristics" competition instead of the "Algorithms" one.

    Instead of the more traditional Leetcode-like problems, it's things like optimizing scheduling/clustering according to some loss function. Think simulated annealing or pruned searches.

    • Dude thank you for stating this.

      OpenAI's o3 model can solve very standard even up to 2700 rated codeforces problems it's been trained on, but is unable to think from first principles to solve problems I've set that are ~1600 rated. Those 2700 algorithms problems are obscure pages on the competitive programming wiki, so it's able to solve it with knowledge alone.

      I am still not very impressed with its ability to reason both in codeforces and in software engineering. It's a very good database of information and a great searcher, but not a truly good first-principles reasoner.

      I also wish o3 was a bit nicer - it's "reasoning" seems to have made it more arrogant at times too even when it's wildly off ,and it kind of annoys me.

      Ironically, this workflow has really separated for me what is the core logic I should care about and what I should google, which is always a skill to learn when traversing new territory.

      1 reply →