Comment by jazz9k

8 days ago

This does sound great, but the cost of tokens will prevent most companies from using agents to secure their code.

Tokens are insanely cheap at the moment. Through OpenRouter a message to Sonnet costs about $0.001 cents or using Devstral 2512 it's about $0.0001. An extended coding session/feature expansion will cost me about $5 in credits. Split up your codebase so you don't have to feed all of it into the LLM at once and it's a very reasonable.

  • It cost me ~$750 to find a tricky privilege escalation bug in a complex codebase where I knew the rough specs but didn't have the exploit. There are certainly still many other bugs like that in the codebase, and it would cost $100k-$1MM to explore the rest of the system that deeply with models at or above the capability of Opus 4.6.

    It's definitely possible to do a basic pass for much less (I do this with autopen.dev), but it is still very expensive to exhaustively find the harder vulnerabilities.

    • This is where the Codex and Claude Code Pro/Max plans are excellent. I rarely run into the limits of Codex. If I do, I wait and come back and have it resume once the window has expired.

      14 replies →

    • How much would it have cost a human to do the same work? The question isn’t how much tokens cost; the question is how much money is saved by using AI to do it.

      3 replies →

    • Compare to the cost when said vulnerabilities are exploited by bad actors in critical systems. Worth it yet?

  • Agentic tasks use up a huge amount of tokens compared to simple chatting. Every elementary interaction the model has with the outside world (even while doing something as simple as reading code from a large codebase) is a separate "chat" message and "response", and these add up very quickly.

  • You’d have to ignore the massive investor ROI expectations or somehow have no capability to look past “at the moment”.

    • That might be a problem for the labs (although I don't think it is) but it's not a problem for end-users. There is enough pressure from top labs competing with each other, and even more pressure from open models that should keep prices at a reasonable price point going further.

      In order to justify higher prices the SotA needs to have way higher capabilities than the competition (hence justifying the price) and at the same time the competition needs to be way below a certain threshold. Once that threshold becomes "good enough for task x", the higher price doesn't make sense anymore.

      While there is some provider retention today, it will be harder to have once everyone offers kinda sorta the same capabilities. Changing an API provider might even be transparent for most users and they wouldn't care.

      If you want to have an idea about token prices today you can check the median for serving open models on openrouter or similar platforms. You'll get a "napkin math" estimate for what it costs to serve a model of a certain size today. As long as models don't go oom higher than today's largest models, API pricing seems in line with a modest profit (so it shouldn't be subsidised, and it should drop with tech progress). Another benefit for open models is that once they're released, that capability remains there. The models can't get "worse".

    • Not really. I'm fully taking advantage of these low prices while they last. Eventually the AI companies will run start running out of funny money and start charging what the models actually cost to run, then I just switch over to using the self hosted models more often and utilize the online ones for the projects that need the extra resources. Currently there's no reason for why I shouldn't use Claude Sonnet to write one time bash scripts, once it starts costing me a dollar to do so I'm going to change my behavior.

      5 replies →

I don't buy it.

Inference cost has dropped 300x in 3 years, no reason to think this won't keep happening with improvements on models, agent architecture and hardware.

Also, too many people are fixated with American models when Chinese ones deliver similar quality often at fraction of a cost.

From my tests, "personality" of an LLM, it's tendency to stick to prompts and not derail far outweights the low % digit of delta in benchmark performance.

Not to mention, different LLMs perform better at different tasks, and they are all particularly sensible to prompts and instructions.

  • “Thing x happened in the past, therefore it will continue to happen in the future” is perhaps one of the most, if not the most pervasive human-created fallacies anywhere.

Tokens aren't more expensive than highly trained meatbags today. There's no way they'll be more expensive "tomorrow"...

  • [flagged]

    • > they are and they will be

      Calculate the approximate cost of raising a human from birth to having the knowledge and skills to do X, along with maintenance required to continue doing X. Multiply by a reasonable scaling factor in comparison to one of today's best LLMs (ie how many humans and how much time to do Xn, vs the LLM).

      Calculate the cost of hardware (from raw elements), training and maintenance for said LLM (if you want to include the cost of research+software then you'll have to also include the costs of raising those who taught, mentored, etc the human as well). Consider that the human usually specializes, while the LLM touches everything. I think you'll find even a roughly approximate answer very enlightening if you're honest in your calculations.

      2 replies →

I'm thinking about how much money Anthropic etc are making from intelligence services who are running Opus 4.6 on ultra high settings 24 hours a day to find these kinds of exploits and take advantage of them before others do.

Expensive for me and you, but peanuts for a nation state.