Comment by m3kw9
4 hours ago
They tested on spark model, i bet it's a mix of that with focus on inference speed. Whatever it is, hopefully it shows up with current models as faster. Token/s is as big thing as anything else, and thats where they can really gain some edge over the competition.
No comments yet
Contribute on Hacker News ↗