Comment by khazhoux

10 days ago

I'm a heavy Cursor user (not yet on Claude) and I see a big disconnect between my own experience and posts like this.

* After a long vibe-coding session, I have to spend an inordinate amount of time cleaning up what Cursor generated. Any given page of code will be just fine on its own, but the overall design (unless I'm extremely specific in what I tell Cursor to do) will invariably be a mess of scattered control, grafted-on logic, and just overall poor design. This is despite me using Plan mode extensively, and instructing it to not create duplicate code, etc.

* I keep seeing metrics of 10s and 100s of thousands of LOC (sometimes even millions), without the authors ever recognizing that a gigantic LOC is probably indicative of terrible heisenbuggy code. I'd find it much more convincing if this post said it generated a 3K SQLite implementation, and not 19K.

Wondering if I'm just lagging in my prompting skills or what. To be clear, I'm very bullish on AI coding, but I do feel people are getting just a bit ahead of themselves in how they report success.

7 comments

khazhoux

TheWas7ed 10 days ago

This has been my experience also, but i've been using everything (Claude code, open code, copilot, etc...) It's impressive when I ask it to do something I don't know how like some python apps, but when it's in my stack I have to constantly stop it mid processing and ask it to fix something. I'm still validanting the plan and rewriting a lot of the code because the quality just is not there yet.

And for the most part I use either opus or sonnet, but for planning sometimes I switch to chatgpt since I think claude is too blunt and does not ask enough questions. I also have local setups with OLlama and have tried for personal projects some kimi models. The results are the same for all, but again claude models are slighly better.

kyars 10 days ago

I don't think I spoke about the fact that yeah, the code quality is suboptimal and this is purely a proof of concept. So I'm going to update the blog post with that information, but I completely agree with you that the code you get with models is not best practices and this is even more so the case when you have many agents on one project that generate lots of redundancy (which I do cover in the blog post).

viraptor 10 days ago

> cleaning up what Cursor generated

What model? Cursor doesn't generate anything itself, and there's a huge difference between gpt5.3-codex and composer 1 for example.

khazhoux 10 days ago
Well, I've got it as Auto (configured by my company and I forget to change it). The list of enabled models includes claude-4.6-opus-high, claude-4.5-sonnet, gpt-5.3-codex, and a couple more.
- Philpax 10 days ago
  
  That is probably Composer-1, which is their in-house model (in so much a fine-tune of an open-weights model can be called in-house). It's competent at grunt work, but it doesn't compare to the best of Claude and Codex; give those a shot sometime.
- viraptor 9 days ago
  
  Auto is not likely to choose the high quality models unless you really try for complex plans. Give the explicit models a try instead. It really makes a difference.

fatherzine 10 days ago

this is the business model bet. the codebase is a big ball of mud that only a superhuman ai can comprehend, therefore everyone must use superhuman ai make changes in the codebase. the selling point is iteration speed, especially early iteration speed

cf. SV conventional wisdom: he who ships first wins the market

in fairness, there is real value in iteration speed. i'm not holding my breath on human comprehensible corporate code bases moving forward. a slew of critical foundational projects, mostly run by the big names, may still care about what used to be called "good engineering practices".