Comment by Gcam
2 years ago
Thanks for the feedback and glad it is useful! Yes, agree might better representative of future use. I think a view of variance would be a good idea, currently just shown in over-time views - maybe a histogram of response times or a box and whisker. We have a newsletter subscribe form on the website or twitter (https://twitter.com/ArtificialAnlys) if you want to follow future updates
Variance would be good, and I've also seen significant variance on "cold" request patterns, which may correspond to resources scaling up on the backend of providers.
Would be interesting to see request latency and throughput when API calls occur cold (first data point), and once per hour, minute, and per second with the first N samples dropped.
Also, at least with Azure OpenAI, the AI safety features (filtering & annotations) make a significant difference in time to first token.