Comment by axegon_
17 hours ago
Yeah... Let's talk time needed for 10M prompts and how that fits into a daily pipeline. Enlighten us, please.
17 hours ago
Yeah... Let's talk time needed for 10M prompts and how that fits into a daily pipeline. Enlighten us, please.
Run them all in parallel with a cloud function in less than a minute?
Yes, how did I not think of throwing more money at cloud providers on top of feeding open ai, when I could have just code a simple binary classifier and run everything on something as insignificant as an 8-th geh, quad core i5....
Did I mention openai?
1 reply →
Obviously all the LLM API providers have a rate limit. Not a fan of GP's sarcastic tone, but I suppose many of us would like to know roughly what that limit would be for a small business using such APIs.
The rate limits for Gemini 1.5 Flash are 2000 requests per minute and 4 million tokens per minute. Higher limits are available on request.
https://ai.google.dev/pricing#1_5flash
4o-mini's rate limits scale based on your account history, from 500RPM/200,000TPM to 30,000RPM/150,000,000TPM.
https://platform.openai.com/docs/guides/rate-limits
Surprisingly, DeepSeek doesn't have a rate limit: https://api-docs.deepseek.com/quick_start/rate_limit
I've heard from people running 100+ prompts in parallel against it.
Also can’t you just combine multiple classification requests into a single prompt?
Yes, for such a simple labelling task request rate limits are more likely the bottleneck than token rate limits.