Comment by binsquare

2 months ago

I run a crowd sourced website to collect data on the best and cheapest hardware setup for local LLM here: https://inferbench.com/

Source code: https://github.com/BinSquare/inferbench

6 comments

binsquare

nodja 2 months ago

Cool site, I noticed the 3090 is on there twice.

https://inferbench.com/gpu/NVIDIA%20GeForce%20RTX%203090

https://inferbench.com/gpu/NVIDIA%20RTX%203090

binsquare 2 months ago

Oh nice catch, I'll fix that
---
Edit: Fixed

kilpikaarna 2 months ago

Nice! Though for older hardware it would be nice if the price reflected the current second hand market (harder to get data for, I know). Eg. Nvidia RTX 3070 ranks as second best GPU in tok/s/$ even at the MSRP of $499. But you can get one for half that now.

binsquare 2 months ago

Great idea - I've added it by manually browsing ebay for that initial data.
So it's just a static value in this hardware list: https://github.com/BinSquare/inferbench/blob/main/src/lib/ha...
Let me know if you know of a better way, or contribute :D

_ea1k 2 months ago

It seems like verification might need to be improved a bit? I looked at Mistral-Large-123B. Someone is claiming 12 tokens/sec on a single RTX 3090 at FP16.

Perhaps some filter could cut out submissions that don't really make sense?

binsquare 1 month ago

Great idea - took a bit to figure out how to implement this.
I came up with a plausibility check based on the model's memory requirements: https://github.com/BinSquare/inferbench/blob/main/src/lib/pl...
So now on the submission page - it has a warning + an automate flag count for volunteers to double check:
```This configuration seems unlikely
Model requires ~906GB VRAM but only 32GB available (28.3x over). This likely requires significant CPU offload which would severely impact performance.
You can still submit, but your result will be flagged for review.```