Comment by energy123

3 months ago

You can consider the o3/o4-mini price to be half that due to flex processing. Flex gives the benefits of the batch API without the downside of waiting for a response. It's not marketed that way but that is my experience. With 20% cache hits I'm averaging around $0.8/million input tokens and $4/million output tokens.

10 comments

energy123

qmmmur 3 months ago

I’m shocked people are signing up to pay even these fees to build presumably CRUD apps. I feel a complete divergence in the profession between people who use this and who don’t.

thedevilslawyer 3 months ago
A whole codebase of 100k lines (~1M tokens) for ~a dollar. Would like to understand why would signing up for this be shocking?
- rowanG077 3 months ago
  
  That's really misrepresenting how it works. Most lines will be written, re-written again and adjusted multiple times. Yesterday I did approx 5 hours of peer-coding with claude 4 opus. And I have these stats:
  Total tokens in: 3,644,200 Total tokens out: 92,349
  And of that only approx 2.3k lines where actually commited for PRs.
  
  4 replies →
koakuma-chan 3 months ago

Some people are struggling to build CRUDs.

Incipient 3 months ago

Do you use them for code generation? I am simply using copilot as $10/mo is a reasonable budget...but quick guesses based on my use, would put code generation via an API at potentially $10/day?

energy123 3 months ago

o3 is a unique model. For difficult math problems, it generates long reasoning traces (e.g. 10-20k tokens). For coding questions, the reasoning tokens are consistently small. Unlike Gemini 2.5 Pro, which generates longer reasoning traces for coding questions.
Cost for o3 code generation is therefore driven primarily by context size. If your programming questions have short contexts, then o3 API with flex is really cost effective.
For 30k input tokens and 3k output tokens, the cost is 30000 * 0.8 / 1000000 + 3000 * 4 / 1000000 = $0.036
But if you have contexts between 100k-200k, then the monthly plans that give you a budget of prompts instead of tokens are probably going to be cheaper.