Comment by philipodonnell
2 months ago
Isn’t this an arbitrage opportunity? Offer to pay a fraction of the cost per token but accept that your tokens will only be processed when the batch window isn’t big enough, then resell that for a markup to people who need non-time sensitive inference?
You may have already noticed that many providers have separate, much lower, prices for offline inference.