Comment by anthonypasq

21 hours ago

this is called progress

3 comments

anthonypasq

I'm asking technically how progress works. What is actually being improved here

anthonypasq 8 minutes ago

mostly cost of hardware going down. as models scale, nvidia produces a new hardware generation that outputs more tokens per watt, but those speed gains get eaten by the fact that the model is bigger ie. more expensive to serve.
Also we have no clue whether Anthropics inference margin is compressing or not and they just want to maintain the price.

Or, we can bleed out cash for a very long time.