← Back to context

Comment by octacat

1 year ago

It is shared between users and better utilized and optimized.

"Sharing between users" doesn't make it cheaper. It makes it more expensive due to the inherent inefficiencies of switching user contexts. (Unless your sales people are doing some underhanded market segmentation trickery, of course.)

  • No, batched inference can work very well. Depending on architecture, you can get 100x or even more tokens out of the system if you feed it multiple requests in parallel.

    • Couldn't you do this locally just the same?

      Of course that doesn't map well to an individual chatting with a chat bot. It does map well to something like "hey, laptop, summarize these 10,000 documents."

      1 reply →