Comment by fazlerocks
3 days ago
Running Llama 3.1 70B on 2x4090s with vLLM. Memory is a pain but works decent for most stuff.
Tbh for coding I just use the smaller ones like CodeQwen 7B. way faster and good enough for autocomplete. Only fire up the big model when I actually need it to think.
The annoying part is keeping everything updated, new model drops every week and half don't work with whatever you're already running.
No comments yet
Contribute on Hacker News ↗