Comment by f311a

1 day ago

How many more months do we need to wait, until big companies realize that flash models work just fine if you:

1) Don't ask LLMs for big changes

2) Review everything and point them in the right direction

Large models still suck at big changes, they produce questionable architecture and you still have to review the code, if your project is serious enough.

The codebase quickly become a mess, if you don't pay enough attention. Does not matter which model.

So why bother with big models, when flash models are 10x cheaper and much faster to iterate under guidance? Large models can be used for security and bug audits. Flash models work almost the same for changes under 300 LOC when you dictate how you want your code to look.

28 comments

f311a

_jab 1 day ago

It's pretty simple; organizations are willing to tolerate paying $1500/month/engineer, which seems to be roughly inline with "normal" consumption for most full-time engineers. If that number grows significantly, then I bet companies will start exploring flash models more, as you propose.

lavezzi 1 day ago
They are willing to tolerate it now, which is quite a switch up from the free for all we had a few weeks ago, and if they aren’t able to tie in this new ~$1500p/m cap to demonstrable productivity and revenue increases then that will be kneecapped even faster
- phreeza 17 hours ago
  
  There are plenty of expenses in this order of magnitude that are not tied to direct increases in productivity. I think it may become a serious hiring impediment for companies to be really skimpy on these budgets for example.
  
  1 reply →
- aiisjustanif 7 hours ago
  
  I mean we saw this with cloud spending and especially with logging and database read write cost across numerous companies.
  It’s a clear pattern in service delivery for software for a while now. Hell for many goods and services in general, like Uber rides themselves.
  Start cheap, get some vendor lock in, service provider reduces discounts, consumer notices and then reacts to the price by reducing consumption.
rudedogg 1 day ago

> organizations are willing to tolerate paying $1500/month/engineer
One organization, that is a software company
> which seems to be roughly inline with "normal" consumption for most full-time engineers
My peers are using $20/mo plans, only a handful are using more than $100/mo in tokens. We haven’t had any limits imposed yet.
epolanski 1 day ago

Which organizations?
Uber is not representative of any trend beyond big tech and VC over funded startups.

mrothroc 1 day ago

The easy decision is to just go with the biggest SOTA model you can afford.

But this overlooks the other critical part of getting the most out of these things: the harness. I run an autonomous plan/design/code/build/test pipeline with agents using my own orchestrator. Different models are better at different stages, and I use LLMs to judge the output between them. Not everything needs Opus 4.8.

The harness provides both the scaffolding to get the right things into the model, and the right things out. But it also lets you dictate which model does which work.

It's the pipeline, not the model, that gets you quality at a given token budget.

chaoz_ 10 hours ago

There is something about using the most advanced tooling possible. Why would you pay for IntelliJ, if Eclipse can do the same thing a bit worse?

You want to master your craft, develop "optimal" systems, understand where things are going by utilizing SOTA.

You can call it FOMO, but you get the point.

lanthissa 4 hours ago

opus to produce workflows, flash 3.5 to do them.

Chinese models prob work too, but idk since i cant use them at work

jmtulloss 1 day ago

Is your argument that $1500 / mo is too much? Why would the engineering team not be more rigorous in their model selection given a constraint?

gravypod 1 day ago
If you had a business task to complete that was only possible with ai and it cost you >$1500/month of work, how long would you have to delay the task so that it's cheaper long run to buy hardware and do local models?
$1,500/mo * 14 months = $21,000.
If local models are 14mo behind as many in HN say it may be profitable to just wait. Maybe just spend a few hundred dollars of your tokens and buy hardware piece by piece.
- therealdrag0 18 hours ago
  
  Nearly no one is doing anything that is “only possible with AI”. This doesn’t seem like a relevant calculation. People spend on AI as an investment in their current productivity.
- pchristensen 1 day ago
  
  There's a lot of opportunity cost to waiting 14 months to build something.
  
  1 reply →
- edmundsauto 21 hours ago
  
  It also presupposes that open models will bridge that gap towards opus4.5, which was really when I drank the AI coding koolaid

econ 1 day ago

I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly? Perhaps, if it can measure complexity, even generate a quote?

Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.

AgentMasterRace 1 day ago

Many harnesses do this, I've recently dropped all my big subscriptions for using deepseek. Codewhale (formerly deepseek-tui) will use pro for large tasks and route smaller ones to flash. It's pretty good, but I just use pro and everything as the cost is quite low.
This one does not have routing, but reasonix is insane, absolutely insane for saving money. I've used 1.3billion tokens at the cost of 4$. (99-100% cache hit)
ValentineC 1 day ago
> I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly?
This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.
- jorl17 1 day ago
  
  Yes, they are all already doing this

andersmurphy 1 day ago

This a thousand times. The bigger models also have a habit of overcomplicating things.

warmwaffles 1 day ago

> Don't ask LLMs for big changes

> Review everything and point them in the right direction

Sorry upper management doesn't care. That's an engineering problem that you need to solve.

eikenberry 1 day ago
They were proposing a solution.. To use flash models and use them in a way that best amplifies your work.
- AgentMasterRace 1 day ago
  
  He was making a joke.
  
  2 replies →

epolanski 1 day ago

I'm legit annoyed at opus 4.8 at any setting above 4.8.

I believe it can be great for vibe coding, but mundane day work? Hell no, I'd rather work with Haiku. It's too slow, checks too many things, it's annoying as hell.