← Back to context

Comment by oceanplexian

23 days ago

I have an AMD Epyc machine with 512GB of RAM and a humble NVIDIA 3090. You will have to run a quantized version but you can get a couple tokens per second out of it since these models are optimized to split across the GPU/RAM and it's about as good as Claude was 12 months ago.

Full disclosure, I use OpenRouter and pay for models most of the time since it's more practical than 5-10 tokens per second, but the option to run it "If I had to, worst case" is good enough for me. We're also in a rapidly developing technology space and the models are getting smaller and better by the day, ever year the smaller models get better