Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by cafkafk

8 hours ago

If you get the inference engine to route the heavy matrix math to the GPU and the speculative drafting to the CPU without choking on latency it's probably gonna be very fast.

Would love to see the benchmarks if someone actually pulls something like that off.

0 comments

cafkafk

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities