← Back to context

Comment by rjh29

2 hours ago

I wonder how many years it'll take for the API token cost to exceed the money spent on ram.

The DS4 folks are unofficially testing ways to run the model with lower performance on lower-RAM machines. Similar efforts are going on with llama.cpp. The results are a bit of a challenge, prefill time tends to explode which is a limitation if you care about agentic workflows.