Obviously all the LLM API providers have a rate limit. Not a fan of GP's sarcastic tone, but I suppose many of us would like to know roughly what that limit would be for a small business using such APIs.
Yes, how did I not think of throwing more money at cloud providers on top of feeding open ai, when I could have just code a simple binary classifier and run everything on something as insignificant as an 8-th geh, quad core i5....
Obviously all the LLM API providers have a rate limit. Not a fan of GP's sarcastic tone, but I suppose many of us would like to know roughly what that limit would be for a small business using such APIs.
The rate limits for Gemini 1.5 Flash are 2000 requests per minute and 4 million tokens per minute. Higher limits are available on request.
https://ai.google.dev/pricing#1_5flash
4o-mini's rate limits scale based on your account history, from 500RPM/200,000TPM to 30,000RPM/150,000,000TPM.
https://platform.openai.com/docs/guides/rate-limits
Surprisingly, DeepSeek doesn't have a rate limit: https://api-docs.deepseek.com/quick_start/rate_limit
I've heard from people running 100+ prompts in parallel against it.
Yes, how did I not think of throwing more money at cloud providers on top of feeding open ai, when I could have just code a simple binary classifier and run everything on something as insignificant as an 8-th geh, quad core i5....
Did I mention openai?
Ah my bad someone further up thread did.
Really it boils down to balance of time and cost, and the skill set of the person getting the job done.
But you seem really anti establishment (hung up over $25 cloud spend), so you do you.
Just don't expect everyone else to agree with you.
Also can’t you just combine multiple classification requests into a single prompt?
Yes, for such a simple labelling task request rate limits are more likely the bottleneck than token rate limits.