We've been waiting on Replicate to launch per-token pricing for LLMs because their previous pay-per-second model was uncompetitive - but it looks like they might have just turned it on with no big announcement! They'll go straight to the top of the priority list.
Do Lambda have a serverless inference API? Not aware of them playing in this space yet.
Presume you mean MPT not MPS - yep we'll look into MosaicML soon.
We've been waiting on Replicate to launch per-token pricing for LLMs because their previous pay-per-second model was uncompetitive - but it looks like they might have just turned it on with no big announcement! They'll go straight to the top of the priority list.
Do Lambda have a serverless inference API? Not aware of them playing in this space yet.
Presume you mean MPT not MPS - yep we'll look into MosaicML soon.