Comment by axegon_

10 months ago

Yeah... Let's talk time needed for 10M prompts and how that fits into a daily pipeline. Enlighten us, please.

9 comments

axegon_

Run them all in parallel with a cloud function in less than a minute?

hnfong 10 months ago
Obviously all the LLM API providers have a rate limit. Not a fan of GP's sarcastic tone, but I suppose many of us would like to know roughly what that limit would be for a small business using such APIs.
- jdietrich 10 months ago
  
  The rate limits for Gemini 1.5 Flash are 2000 requests per minute and 4 million tokens per minute. Higher limits are available on request.
  https://ai.google.dev/pricing#1_5flash
  4o-mini's rate limits scale based on your account history, from 500RPM/200,000TPM to 30,000RPM/150,000,000TPM.
  https://platform.openai.com/docs/guides/rate-limits
- simonw 10 months ago
  
  Surprisingly, DeepSeek doesn't have a rate limit: https://api-docs.deepseek.com/quick_start/rate_limit
  I've heard from people running 100+ prompts in parallel against it.
axegon_ 10 months ago
Yes, how did I not think of throwing more money at cloud providers on top of feeding open ai, when I could have just code a simple binary classifier and run everything on something as insignificant as an 8-th geh, quad core i5....
- FloorEgg 10 months ago
  
  Did I mention openai?
  
  1 reply →
rlt 10 months ago
Also can’t you just combine multiple classification requests into a single prompt?
- FloorEgg 10 months ago
  
  Yes, for such a simple labelling task request rate limits are more likely the bottleneck than token rate limits.