Comment by irishcoffee
4 hours ago
NVIDIA H200 Is not a standard GPU. 8 of them in a box with a cpu and ram costs close to the same as a house.
I am 100% all about using local models instead of sending someone else all my data and paying for the privilege of doing so, this article is misleading.
I can get a 27b model to kick out 40 tok/s on 16 gb vram. This is the area ripe for development.
If you can’t connect a monitor, it isn’t a standard GPU, at least not in the way people have spoken about GPUs until a few years ago.
I guessed you thought about consumer GPUs. We are about standard datacenter GPUs indeed.
Sorry for the confusion
Do you think maybe changing your articles title from "Real-time LLM Inference on Standard GPUs" to "Real-time LLM Inference on Standard Datacenter GPUs" might make sense here? Given more people seem confused by the title than not, and you could clear this up relatively easily, at least on your website although might be late to fix the HN title.
YES - I just updated the title of our article according to your suggestion.
Oh, it isn't confusing, it is misleading. A standard GPU lets you connect a monitor. A datacenter GPU lets you do headless math.
I updated the article title accordingly
1 reply →