Comment by iaw
7 hours ago
vLLM supports "banning" certain tokens but I don't know if it can dynamically reduce them.
To my knowledge you can also "ban" with llama.cpp but it is passed in the API call rather than to the server at initialization.
7 hours ago
vLLM supports "banning" certain tokens but I don't know if it can dynamically reduce them.
To my knowledge you can also "ban" with llama.cpp but it is passed in the API call rather than to the server at initialization.
No comments yet
Contribute on Hacker News ↗