Comment by radq
2 days ago
Cool project! The codebase is simple and well documented, a good starting point for anyone interested in how to implement a high-performance inference engine. The prefix sharing is very relevant for anyone running batch inference to generate RL rollouts.
No comments yet
Contribute on Hacker News ↗