Comment by elliotto

2 days ago

In my experience the api call is trivial compared to the time taken for the LLM to compose the response.

1 comment

elliotto

gemini flash and groq are pretty fast, and that part is streamable. curiosity got the best of me so i had claude code write a quick test. given this test is simply is 20 requests, with 1 second delay between requests ran once. so take with a grain of salt, but interesting still. Extra half second in a search is super noticeable so google looking like a reasonable improvement.

  OpenAI Statistics:

  - Average: 0.360 seconds
  - Median: 0.292 seconds
  - Min: 0.211 seconds
  - Max: 0.779 seconds
  - Std Dev: 0.172 seconds

  Google Gemini Statistics:

  - Average: 0.304 seconds
  - Median: 0.273 seconds
  - Min: 0.250 seconds
  - Max: 0.445 seconds
  - Std Dev: 0.066 seconds

  The key insights from these numbers:
  - Google has much lower standard deviation (0.066 vs 0.172), meaning more consistent/predictable performance
  - Google's worst-case (max) is much better than OpenAI's (0.445s vs 0.779s)
  - OpenAI had a slightly better best-case (min) performance (0.211s vs 0.250s)
  - Google's performance is more tightly clustered around its average, while OpenAI has more variability