← Back to context

Comment by onion2k

16 hours ago

Putting too much trust in an agent is definitely a problem, but I have to admit I've written about a dozen little apps in the past year without bothering to look at the code and they've all worked really well. They're all just toys and utilities I've needed and I've not put them into a production system, but I would if I had to.

Agents are getting really good, and if you're used to planning and designing up front you can get a ton of value from them. The main problem with them that I see today is people having that level of trust without giving the agent the context necessary to do a good job. Accepting a zero-shotted service to do something important into your production codebase is still a step too far, but it's an increasingly small step.

>> Putting too much trust in an agent is definitely a problem, but I have to admit I've written about a dozen little apps in the past year without bothering to look at the code and they've all worked really well. They're all just toys and utilities I've needed and I've not put them into a production system, but I would if I had to.

I have been doing this to, and I've forgotten half of them. For me the point is that this usage scenario is really good, but it also has no added value to it, really. The moment Claude Code raises it prices 2x this won't be viable anymore, and at the same time to scale this to enterprise software production levels you need to spend on an agent probably as much as hiring two SWEs, given that you need at least one to coordinate the agents.

  • Deepseek v3.2 tokens are $0.26/0.38 on OpenRouter. That model - released 4 months ago - isn't really good enough by today's standards, but its significantly stronger than Opus 4.1, which was only released last August! In 12 months I think its reasonable to expect there will be a model with less cost than that which is significantly stronger than anything available now.

    And no, it isn't ONLY because VC capital is being burned to subsidize cost. That is impossible for the dozen smaller providers offering service at that cost on OpenRouter who have to compete with each other for every request and also have to pay compute bills.

    Qwen3.5-9B is stronger than GPT-4o and it runs on my laptop. That isn't just benchmarks either. Models are getting smaller, cheaper and better at the same time and this is going to continue.

  • I think Claude could raise it's prices 100x and people would still use it. It'd just shift to being an enterprise-only option and companies would actually start to measure the value instead of being "Whee, AI is awesome! We're definitely going really fast now!"

    • 100x? You think people would pay $20k per month for Claude Code?

      Codex is as good (or very nearly) as Claude code. Open source models continue to improve. The open source harnesses will also continue to improve. Anthropic is good, but it has no moat. No way could they 100x their prices.