← Back to context

Comment by Scene_Cast2

21 days ago

I tried doing some vibe coding on a greenfield project (using gemini 2.5 pro + cline). On one hand - super impressive, a major productivity booster (even compared to using a non-integrated LLM chat interface).

I noticed that LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt. One easy example is that I noticed them breaking abstractions (putting things where they don't belong). Unfortunately, there's not that much self-retrospection on these aspects if you ask about the quality of the code or if there are any better ways of doing it. Of course, if you pick up that something is in the wrong spot and prompt better, they'll pick up on it immediately.

I also ended up blowing through $15 of LLM tokens in a single evening. (Previously, as a heavy LLM user including coding tasks, I was averaging maybe $20 a month.)

> I also ended up blowing through $15 of LLM tokens in a single evening.

This is a feature, not a bug. LLMs are going to be the next "OMG my AWS bill" phenomenon.

  • Cline very visibly displays the ongoing cost of the task. Light edits are about 10 cents, and heavy stuff can run a couple of bucks. It's just that the tab accumulates faster than I expect.

    • > Light edits are about 10 cents

      Some well-paid developers will excuse this with, "Well if it saved me 5 minutes, it's worth an order of magnitude than 10 cents".

      Which is true, however there's a big caveat: Time saved isn't time gained.

      You can "Save" 1,000 hours every night, but you don't actuall get those 1,000 hours back.

      11 replies →

    • > Cline very visibly displays the ongoing cost of the task

      LLMs are now being positioned as "let them work autonomously in the background" which means no one will be watching the cost in real time.

      Perhaps I can set limits on how much money each task is worth, but very few would estimate that properly.

      1 reply →

  • Especially at companies (hence this github one), where the employees don't care about cost because it's the boss' credit card.

  • I think that models are gonna commoditize, if they haven't already. The cost of switching over is rather small, especially when you have good evals on what you want done.

    Also there's no way you can build a business without providing value in this space. Buyers are not that dumb.

    • They are already quite commoditized. Commoditization doesn't mean "cheap", and it doesn't mean you won't spend $15 a night like the GP did.

> I also ended up blowing through $15 of LLM tokens in a single evening.

Consider using Aider, and aggressively managing the context (via /add, /drop and /clear).

https://aider.chat/

  • I, too, recommend aider whenever these discussions crop up; it converted me from the "AI tools suck" side of this discussion to the "you're using the wrong tool" side.

    I'd also recommend creating little `README`'s in your codebase that are mainly written with aider as the intended audience. In it, I'll explain architecture, what code makes (non-)sense to write in this directory, and so on. Has the side-effect of being helpful for humans, too.

    Nowadays when I'm editing with aider, I'll include the project README (which contains a project overview + pointers to other README's), and whatever README is most relevant to the scope of my session. It's super productive.

    I'm yet to find a model that beats the cost-effectiveness of Sonnet 3.7. I've tried the latest deepseek models, and while I love the price (nearly 50x cheaper?), it's just far too error-prone compared to Sonnet 3.7. It generates solid plans / architecture discussions, but, unlike Sonnet, the code it generates often confidently off-the-mark.

I loathe using AI in a greenfield project. There are simply too many possible paths, so it seems to randomly switch between approaches.

In a brownfield code base, I can often provide it reference files to pattern match against. So much easier to get great results when it can anchor itself in the rest of your code base.

While its being touted for Greenfield projects I've notices a lot of failures when it comes to bootstrapping a stack.

For example it (Gemini 2.5) really struggles with newer ecosystem like Fastapi when wiring libraries like SQLAlchemy, Pytest, Python-playwright, etc., together.

I find more value in bootstrapping myself, and then using it to help with boiler plate once an effective safety harness is in place.

I've vibe coded small project as well using Claude Code. It's about visitors registration at the company. Simple project, one form, a couple of checkboxes, everything is stored in sqlite + has endpoint for getting .xlsx.

Initial cost was around $20 USD, which later grew to (mostly polishing) $40 with some manual work.

I've intentionally picked up simple stack: html+js+php.

A couple of things:

* I'd say I'm happy about the result from product's perspective * Codebase could be better, but I could not care less about in this case * By default, AI does not care about security unless I specifically tell it * Claude insisted on using old libs. When I've specifically told it to use the latest and greatest, it upgraded them but left code that works just with an old version. Also it mixed latest DaisyUI with some old version of tailwindcss :)

On one hand it was super easy and fun to do, on the other hand if I was a junior engineer, I bet it would have cost more.

If you want to use Cline and are at all price sensitive (in these ranges) you have to do manual context management just for that reason. I find that too cumbersome and use Windsurf (currently with Gemini 2.5 pro) for that reason.

> LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt

I wonder if the next phase would be the rise of (AI-driven?) "linters" that check that the implementation matches the architecture definition.

I think it's just that it's not end-to-end trained on architecture because the horizon is too short. It doesn't have the context length to learn the lessons that we do about good design.

> I noticed that LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt. One easy example is that I noticed them breaking abstractions

That doesn’t matter anymore when you’re vibe coding it. No human is going to look at it anyway.

It can all be if/else on one line in one file. If it works and if the LLMs can work at, iterate and implement new business requirements, while keeping performance and security - code structure, quality and readability don’t matter one bit.

Customers don’t care about code quality and the only reason businesses used to care is to make it less money consuming to build and ship new things, so they can make more money.

  • Wild take. Let’s just hand over the keys to LLMs I suppose, the fancy next token predictor is the capitan now.

    • Not that wild TBH.

      This is a common view, and I think will be the norm on the near-to-mid term, especially for basic CRUD apps and websites. Context windows are still too small for anything even slightly complex (I think we need to be at about 20m before we start match human levels), but we'll be there before you know it.

      Engineers will essentially become people who just guide the AIs and verify tests.

      1 reply →

  • LLMs need a very heavy hand in guiding the architecture because otherwise they'll code it in a way that even they can't maintain or expand.

    • Hook up something like Taskmaster or Shrimp, so that they can document as they go along and they can retrieve relevant context when they overflow their context to avoid this issue.

      Then as the context window increases, it’s less and less of an issue

I don’t get it? Isn’t it just a monthly fixed subscription.

  • For now. Who is to say in 5 years where everyone makes this THE default workflow things work go up in price?

  • Nope - I use a-la-carte pricing (through openrouter). I much prefer it over a subscription, as there are zero limits, I pay only for what I use, and there is much less of a walled garden (I can easily switch between Anthropic, Google, etc).