Comment by kgeist 2 months ago What about constrained decoding (with JSON schemas)? I noticed my vLLM instance is using 1 CPU 100%. 0 comments kgeist Reply No comments yet Contribute on Hacker News ↗
No comments yet
Contribute on Hacker News ↗