Comment by storystarling
8 hours ago
Raw CUDA works for the heavy lifting but I suspect it gets messy once you implement things like grammar constraints or beam search. You end up with complex state machines during inference and having standard library abstractions seems pretty important to keep that logic from becoming unmaintainable.
No comments yet
Contribute on Hacker News ↗