← Back to context

Comment by H8crilA

1 year ago

Do we have any estimate on the size of OpenAI top of the line models? Would they also fit in ~512GB of (V)RAM?

Also, this whole self hosting of LLMs is a bit like cloud. Yes, you can do it, but it's a lot easier to pay for API access. And not just for small users. Personally I don't even bother self hosting transcription models which are so small that they can run on nearly any hardware.

It's nice because a company can optionaly provide a SOTA reasoning model for their clients, without having to go through a middleman e.g. HR co. Can provide an LLM for their HRMS system for a small 2000$ investment. Not 2000/month, just a one time 2000 investment.

  • No one will be doing anything practical with a local version of Deepseek on a $2000 server. The token throughout of this thing is like, 1 token every 4 seconds. It would take nearly a full minute just to produce a standard “Roses are red, violets are blue” poem. There’s absolutely no practical usage that you can use that for. It’s cool that you can do it, and it’s a step in the right direction, but self-hosting these wont be a viable alternative to using providers like OpenAI for business applications for a while.

    • > but self-hosting these wont be a viable alternative to using providers like OpenAI for business applications for a while.

      Why not? While 3-4 tok/s is still on the lower end, it is still usable to the point that I can use it for any task that doesn't require me to get into a real-time communication with the model.

      In other words, I don't mind waiting a 1-minute for good-enough response from the model for topic that would take me multiples of that to compile and research on my own. It's a clear net win.

      1 reply →

Is the size of OpenAI‘s top of the line models even relevant? Last I checked they weren’t open source in the slightest.

it would make sense if you don't want somebody else to have access to all your code and customer data.