Comment by _micah_h

2 years ago

We've been waiting on Replicate to launch per-token pricing for LLMs because their previous pay-per-second model was uncompetitive - but it looks like they might have just turned it on with no big announcement! They'll go straight to the top of the priority list.

Do Lambda have a serverless inference API? Not aware of them playing in this space yet.

Presume you mean MPT not MPS - yep we'll look into MosaicML soon.