Comment by aspenmartin

1 month ago

> Because software developers typically understand how to implement a solution to problem better than the client. If they don't have enough details to implement a solution, they will ask the client for details. If the developer decides to use an LLM to implement a solution, they have the ability to assess the end product.

Why do you think agents can’t do that? They can’t do this really well today but if the distance we went in 2025 stays similar it’ll be like a year before this starts getting decent and then like another 1 year before it’s excellent.

> Sure, you will see a few people using LLMs to develop personalized software for themselves. Yet these will be people who understand how to specify the problem they are trying to solve clearly, will have the patience to handle the quirks and bugs in the software they create

Only humans can do this?

3 comments

aspenmartin

dimitri-vs 1 month ago

Hallucinations are not solved, memory is not solved, prompt injection is not solved, context limits are waaay too low at the same time tokens way too expensive to take advantage of context limits, etc. These problems have existed since the very early days of GPT-4 and there is no clear path to them being solved any time soon.

You basically need AGI and we are nowhere close to AGI.

aspenmartin 1 month ago

All of the issue you talk about are true, and I don’t personally care about AGI it’s kind of a mishmash of a real thing and a nice package for investors but what I do care about is what has been released and what it can do
All of the issues you talk about: they aren’t solved but we’ve made amazing progress on all of them. Continual learning is a big one and labs are likely close to some POCs.
Token costs per unit performance rapidly goes down. GPT4 level perf costs you 10x less today than two years ago. This will continue to be the case as we just continually push efficiency up.
The AGI question “are we close” tbh to me these questions are just rabbit holes and bait for flame wars because no one can decide on what it means and then even if you do (e.g. super human perf on all economically viable tasks is maybe more of a solid staring point) everyone fights about the ecological validity of evals.
All I’m saying is: taking coding in a complete vacuum, we’re very very close to being at a point where it becomes so obviously beneficial and failure rates for many things fall below the critical thresholds that automating even the things people say make engineers unique (working with people to navigate ambiguous issues that they aren’t able to articulate well, making the right tradeoffs, etc) starts looking like less of a research challenge and more of an exercise in deployment

nicbou 1 month ago

My experience with LLMs is that you can’t get them to clarify things by asking questions. They just assume facts and run with it.

My experience with software development is that a lot of things are learned by asking probing questions informed by your experience. How would an LLM understand the subtle political context behind a requirement or navigate unwritten rules discussed in meetings a year ago?