← Back to context

Comment by xyzsparetimexyz

10 hours ago

What kind of basic ass CRUD apps are people even working on that they're on stage 5 and up? Certainly not anything with performance, visual, embedded or GPU requirements.

I find it funny that as these systems become better at something (i.e. "basic ass CRUD"), people still maintain that they're only good at those and nothing else.

Case in point - https://github.com/NVlabs/vibetensor/blob/main/docs/vibetens...

> VIBETENSOR is an open-source research system software stack for deep learning, generated by LLM-powered coding agents under high-level human guidance. In this paper, “fully generated” refers to code provenance: implementation changes were produced and applied as agent-proposed diffs; validation relied on builds, tests, and differential checks executed by the agent workflow, without per-change manual diff review.

I think you massively underestimate the number of useful apps that are crud and a bit of business logic and styling. They’re useful, can genuinely take time to build, can be unique every time, and yet not brand new research projects.

  • A lot of stuff is simultaneously useful but not mission critical, which is where I think the sweet spot of LLMs currently lies.

    In terms of the state of software quality, the bar has actually been _lowered_, in that even major user-facing bugs in operating systems are no longer a showstopper. So it's no surprise to me that people are vibe-coding things "in prod" that they actually sell to other people (some even theorize claude code itself is vibe-coded, hence its bugs. And yet that hasn't slowed down adoption because of the claude max lock in).

    So maybe one alternate way to see the "productivity gains" from vibe-coding in deployed software is that it's actually a realization that quality doesn't matter. The seeds for this were already laid years back when QA vanished as a field.

    LLMs occupy a new realm in the pareto frontier, the "slipshod expert". Usually humans grow from "sloppy incompetent newb" to the "prudent experienced dev". But now we have a strange situation where LLMs can write code (e.g. vectorized loops, cuda kernels) that could normally only be done by those with sufficient domain knowledge, and yet (ironically) it's not done with the attention and fastidiousness you'd expect from such an experienced dev.

  • No totally, I agree. But I don't think that anyone will be YOLO vibe coding massive changes into Blender or ffmpeg any time soon.

    • Probably not, though additions maybe - I added the feature where the sculpt tool turns as you move it around if I recall right, many moons ago - I don’t think it was that hard but was a useful change.

What would be an example of something you think wouldn’t work with 5 or higher? Is there something about GPU programming that LLMs can’t handle?

  • I doubt they'd do a very good job of debugging a gpu crash, or visual noise caused by forgotten synchronization, or odd looking shadows.

    Mayybe for some things you could set it up so that the screen output is livestreamed back into the agent, but I highly doubt that anyone is doing that for agents like this yet

    • > Mayybe for some things you could set it up so that the screen output is livestreamed back into the agent, but I highly doubt that anyone is doing that for agents like this yet

      What do you mean by streaming? LLMs aren’t that advanced yet where they can consume a live video feed but people have been feeding them screenshots from Playwright and desktop apps for years (Anthropic even released the Computer Use feature based on this).

      Gemini has the best visual intelligence but all three of the major models have supported this for a while. I don’t think it’d help with fixing subtle problems in shadows but it can fix other gui bugs using visual feedback.

    • I am a GPU programmer (on the compute side), and the biggest challenge is lack of tooling.

      For host-side code the agent can throw in a bunch of logging statements and usually printf its way to success. For device-side code there isn't a good way to output debugging info into a textual format understandable by the agent. Graphical trace viewers are great for humans, not so great for AI right now.

      On the other hand, Cline's harness can interact with my website and click on stuff until the bugs are gone.

      2 replies →