Comment by irishcoffee

4 hours ago

NVIDIA H200 Is not a standard GPU. 8 of them in a box with a cpu and ram costs close to the same as a house.

I am 100% all about using local models instead of sending someone else all my data and paying for the privilege of doing so, this article is misleading.

I can get a 27b model to kick out 40 tok/s on 16 gb vram. This is the area ripe for development.

If you can’t connect a monitor, it isn’t a standard GPU, at least not in the way people have spoken about GPUs until a few years ago.

6 comments

irishcoffee

gaeld 4 hours ago

I guessed you thought about consumer GPUs. We are about standard datacenter GPUs indeed.

Sorry for the confusion

embedding-shape 3 hours ago
Do you think maybe changing your articles title from "Real-time LLM Inference on Standard GPUs" to "Real-time LLM Inference on Standard Datacenter GPUs" might make sense here? Given more people seem confused by the title than not, and you could clear this up relatively easily, at least on your website although might be late to fix the HN title.
- gaeld 3 hours ago
  
  YES - I just updated the title of our article according to your suggestion.
irishcoffee 3 hours ago
Oh, it isn't confusing, it is misleading. A standard GPU lets you connect a monitor. A datacenter GPU lets you do headless math.
- gaeld 3 hours ago
  
  I updated the article title accordingly
  
  1 reply →