Comment by energy123
3 months ago
You can consider the o3/o4-mini price to be half that due to flex processing. Flex gives the benefits of the batch API without the downside of waiting for a response. It's not marketed that way but that is my experience. With 20% cache hits I'm averaging around $0.8/million input tokens and $4/million output tokens.
I’m shocked people are signing up to pay even these fees to build presumably CRUD apps. I feel a complete divergence in the profession between people who use this and who don’t.
A whole codebase of 100k lines (~1M tokens) for ~a dollar. Would like to understand why would signing up for this be shocking?
That's really misrepresenting how it works. Most lines will be written, re-written again and adjusted multiple times. Yesterday I did approx 5 hours of peer-coding with claude 4 opus. And I have these stats:
Total tokens in: 3,644,200 Total tokens out: 92,349
And of that only approx 2.3k lines where actually commited for PRs.
4 replies →
Some people are struggling to build CRUDs.
Do you use them for code generation? I am simply using copilot as $10/mo is a reasonable budget...but quick guesses based on my use, would put code generation via an API at potentially $10/day?
o3 is a unique model. For difficult math problems, it generates long reasoning traces (e.g. 10-20k tokens). For coding questions, the reasoning tokens are consistently small. Unlike Gemini 2.5 Pro, which generates longer reasoning traces for coding questions.
Cost for o3 code generation is therefore driven primarily by context size. If your programming questions have short contexts, then o3 API with flex is really cost effective.
For 30k input tokens and 3k output tokens, the cost is 30000 * 0.8 / 1000000 + 3000 * 4 / 1000000 = $0.036
But if you have contexts between 100k-200k, then the monthly plans that give you a budget of prompts instead of tokens are probably going to be cheaper.