Comment by zambelli
21 days ago
Hi! Latency is definitely a factor in any system, and the dashboard and paper do report elapsed time - but at the workflow level.
On a per-call basis, the wrappers are pure python ifs and such, measured in ms easily, and frankly negligible compared to the LLM call itself which will be on the order of magnitude seconds.
Where timing gets interesting is that forge will slow down workflows because the retries mean you don't error right away. Bare runs were failing fast in my experience. But on a per-call basis there's very little overhead.
I haven't detailed it simply because the order of magnitude of a single LLM call is so much higher than all the overhead put together.
Hi! Thanks for the response. Like I mentioned, I only skimmed, and it sounds like there's more to it than I understand, so I'll take a deeper look and see how it feels in practice.
> Where timing gets interesting is that forge will slow down workflows because the retries mean you don't error right away. Bare runs were failing fast in my experience. But on a per-call basis there's very little overhead.
> I haven't detailed it simply because the order of magnitude of a single LLM call is so much higher than all the overhead put together.
Yeah, that makes sense and seems fair. The sort of delays are almost and inevitability, you're not trying to improve speed, but by improving reliability, it can obviously increase overall throughput.
Having watched the demo video too now, automating retries etc would be helpful for me. It's impressive to see how quick the models run on better hardware, and the performance improvements are impressive, even if the overall run takes longer sometimes because it does more correct things. Thanks again!
> On a per-call basis, the wrappers are pure python ifs and such, measured in ms easily
Ah that's good to know
when I first saw this posted yesterday I was wondering that, kind of assumed maybe it was doing extra LLM calls to make judgements
Retry nudges do generate an extra LLM call, and those average extra calls time impacts are captured in the eval data.
But that's the difference between the call failing and succeeding (eventually).
On successful calls the presence of forge should be unnoticeable.