Comment by Bolwin
9 days ago
No one is producing one output token though.
And using up gpus for that cache is a pretty big opportunity cost. I highly doubt it's done in vram. That would be insane for the one hour caches.
So its memory + the time it takes to unload/load into vram + the extra cost per output token
Is it a scam? Idk
No comments yet
Contribute on Hacker News ↗