Comment by FloorEgg

5 months ago

Run them all in parallel with a cloud function in less than a minute?

8 comments

FloorEgg

Obviously all the LLM API providers have a rate limit. Not a fan of GP's sarcastic tone, but I suppose many of us would like to know roughly what that limit would be for a small business using such APIs.

jdietrich 5 months ago

The rate limits for Gemini 1.5 Flash are 2000 requests per minute and 4 million tokens per minute. Higher limits are available on request.
https://ai.google.dev/pricing#1_5flash
4o-mini's rate limits scale based on your account history, from 500RPM/200,000TPM to 30,000RPM/150,000,000TPM.
https://platform.openai.com/docs/guides/rate-limits
simonw 5 months ago

Surprisingly, DeepSeek doesn't have a rate limit: https://api-docs.deepseek.com/quick_start/rate_limit
I've heard from people running 100+ prompts in parallel against it.

axegon_ 5 months ago

Yes, how did I not think of throwing more money at cloud providers on top of feeding open ai, when I could have just code a simple binary classifier and run everything on something as insignificant as an 8-th geh, quad core i5....

FloorEgg 5 months ago
Did I mention openai?
- FloorEgg 5 months ago
  
  Ah my bad someone further up thread did.
  Really it boils down to balance of time and cost, and the skill set of the person getting the job done.
  But you seem really anti establishment (hung up over $25 cloud spend), so you do you.
  Just don't expect everyone else to agree with you.

rlt 5 months ago

Also can’t you just combine multiple classification requests into a single prompt?

FloorEgg 5 months ago

Yes, for such a simple labelling task request rate limits are more likely the bottleneck than token rate limits.