Comment by Gcam
2 years ago
Hi HN, Thanks for checking this out! Goal with this project is to provide objective benchmarks and analysis of LLM AI models and API hosting providers to compare which to use in your next (or current) project. Benchmark comparisons include quality, price, technical performance (e.g. throughput, latency).
Twitter thread with initial insights: https://twitter.com/ArtificialAnlys/status/17472648324397343...
All feedback is welcome
Any chance of including some of the better fine tunes, e.g. wizard or tulu? (worse than mixtral but I assume other finetines will be better just like wizard and tulu are better than LLAMA2)
I guess their cost is same as base model although would effect performance.
Hey, yeah the bar for adding finetunes will probably be that they're being hosted by ~3 supported hosting providers. Very much open to it!
Can quality score be added for each inference provider for the same model. Many of them use different quantization and approximation so that it's not just price and throughput that's important. Specially for model like Mixtral.
I'd love to see replicate.com (pay per sip) on there. And lambdalabs.com
[edit: And also MPS]
We've been waiting on Replicate to launch per-token pricing for LLMs because their previous pay-per-second model was uncompetitive - but it looks like they might have just turned it on with no big announcement! They'll go straight to the top of the priority list.
Do Lambda have a serverless inference API? Not aware of them playing in this space yet.
Presume you mean MPT not MPS - yep we'll look into MosaicML soon.