Comment by legitster
7 days ago
Complete guess, but my hunch is that it's in the sharding. When they break apart your input into its components, they send it off to hardware that is optimized to solve for that piece. On that hardware they have insane VRAM and it's already cached in a way that optimizes that sort of problem.
No comments yet
Contribute on Hacker News ↗