Comment by ineedasername
1 month ago
Yeah, tokens per second can very much influence the work style and therefore mindset a person should bring to usage. You can also build on the results of a faster but less than SOTA class model in different ways. I can let a coding tuned 7-12b model “sketch” some things at higher speed, or even a variety of things, and I can review real time, and pass off to a slower more capable model to say “this is structural sound, or at least the right framing, tighten it all up in the following ways…” and run in the background.
No comments yet
Contribute on Hacker News ↗