Comment by Revanche1367
11 hours ago
The argument is flawed, there is no logical reason to assume a single prompt won’t be sufficient to constitute the complexity of a software project. It may not be practical in many cases but there is too much variability in what is considered a complex software project and in the sufficiency of instruction in a single prompt to make that claim and say it’s “by definition.”
And that prompt will basically be 2000 page spec Bible à la IBM circa 1960, see waterfall. Unless AI develops mindreading (and advanced mindreading at that), single prompt creation of actual complex software products will never happen. You'll one shot a simple non scientific calculator, but not Excel or Vim or Nginx.
Why not? Given a proper spec, you should absolutely be able to one-shot Excel, particularly if we put it at the level of complexity of, say, Excel 1.0 for Mac.
Current models aren't capable of that, but that doesn't mean it's not possible.
The issue is not the models, the issue is that this method ws tried before, and humans suck at writing what they want. Developing in small increments allowing feedback was an answer to this issue.
If you made models able to code to long spec, you would be left with the hard issue of having to write them.
2 replies →
Seems like this would be a good time to use this famous quote:
> given the sufficiently smart compiler
For those unaware, this is a similar quote used by compiler proponents. The first full compiler was created in 1957 (+/- 70 years ago) and the "sufficiently smart compiler" never happened, hand written code from the best coders still is faster. Now, that doesn't mean that compilers didn't do the job well enough, we just accepted that 90-95% of the top speed was enough for almost everything.
To the LLM one shotting point, it took 30 (40?) years for compilers to be good enough for the mass market. Caveat early adopter and investor.
Plus what pyrale said.
One shot prompting/tooling is the only reasonable way to use an llm in my opinion. You should not be having an LLM operating for hours creating thousands of lines of new code that you can never review or maintain. You can actually be highly productive modifying a single file or two at a time, ideally as focused and little context as possible, without the llm being given full permission to add as much context as possible along the way to maximize revenue for the developers of the harness.
The agentic engineering paradigm is just a narrative trend pushed by AI companies to get people to 10x their token consumption per prompt. It plays into people's laziness and addiction to dopamine too causing addict like behavior in people that fall prey to this trend.
I disagree fundamentally.
If I do that, I'm literally slower then just doing the change without sufficiently specifying it to the model.
I can see how a junior dev or generally someone that's not particularly knowledgeable about the language or framework they're working with may benefit from such usage, but for experienced people there is very little value in that approach.
I say this because I've just had to face this decision this month with Copilot introducing the usage based billing. I attempted to scale back my usage, first with non-opus - output essentially became discardable as it continually hallucinated no existing fields in the responses of Apis etc... Then my scoping the changes smaller and smaller, until I ultimately gave up and reduced usage to just generating tests.
I agree. And at work it has been producing some of the worst GUI test cases I have ever seen.
What is tested often makes no sense at all, completely implausible edge cases are tested on internals, while it doesn't create tests for the overall application using user events.
And some things in these test cases are downright ridiculous: instead of instantiating your classes, it sets up some barebones fake objects reimplementing some of the behavior of your actual class, then ignores the TypeScript errors via force cast or similar.
Then it proceeds to slap some test ids on the output, stubs components and dependencies more or less randomly, adds some assertions on test ids and calls it a day.
Apparently that's good enough for many colleagues to open a MR for that garbage.
That said, at home with SOTA models I happily hand large units of work to it, outsource much of the thinking, and get workable results. I think this is the future.
2 replies →
I disagree, fundamentally.
I see little value in throwing a ton of context at an llm and waiting 10-20 minutes for a coin flip on whether or not its going to produce junk. I'd rather do quick 60 second turns, get most of the way there and fix the rest myself if I have to. I'd rather honestly just not use them.
4 replies →