Comment by thijson
1 year ago
I think batch processing of many requests is cheaper. As each layer of the model is loaded into cache, you can put through many prompts. Running it locally you don't have that benefit.
1 year ago
I think batch processing of many requests is cheaper. As each layer of the model is loaded into cache, you can put through many prompts. Running it locally you don't have that benefit.
No comments yet
Contribute on Hacker News ↗