Yea the proxying/observability is without question the simplest part of this whole problem space. Once you get into the weeds of automating all the eval and prompt optimizing, you realize how irrelevant wireshark actually is in the feedback loop.
But I also like you landed on mitmproxy as well, after starting with tcpdump/wireshark. I recently started building a tiny streaming textual gradient based optimizer (similar to what adalflow is doing) by parsing the mitmproxy outputs in realtime. Having a turnkey solution for this sort of thing will definitely be valuable at least in the near to mid term.
wireshark would work for seeing the requests from the desktop app to Cursor’s servers (which make the actual LLM requests). But if you’re interested in what the actual requests to LLMs look like from Cursor’s servers you have to set something like this up. Plus, this lets us modify the request and A/B test variations!
Sorry, can you explain this a bit more? Either you're putting something between your desktop to the server (in which case Wireshark would work) or you're putting something between Cursor's infrastructure and their LLM provider, in which case, how?
we're doing the latter! Cursor lets you configure the OpenAI base URL so we were able to have Cursor call Ngrok -> Nginx (for auth) -> TensorZero -> LLMs. We explain in detail in the blog post.
The article literally says at the end this was just the first post about looking before getting into actually changing the responses.
(that being said, mitmproxy has gotten pretty good for just looking lately https://docs.mitmproxy.org/stable/concepts/modes/#local-capt... )
Yea the proxying/observability is without question the simplest part of this whole problem space. Once you get into the weeds of automating all the eval and prompt optimizing, you realize how irrelevant wireshark actually is in the feedback loop.
But I also like you landed on mitmproxy as well, after starting with tcpdump/wireshark. I recently started building a tiny streaming textual gradient based optimizer (similar to what adalflow is doing) by parsing the mitmproxy outputs in realtime. Having a turnkey solution for this sort of thing will definitely be valuable at least in the near to mid term.
if you haven't check out our repo -- it's free, fully self-hosted, production-grade, and designed for precisely this application :)
https://github.com/TensorZero/tensorzero
1 reply →
wireshark would work for seeing the requests from the desktop app to Cursor’s servers (which make the actual LLM requests). But if you’re interested in what the actual requests to LLMs look like from Cursor’s servers you have to set something like this up. Plus, this lets us modify the request and A/B test variations!
Sorry, can you explain this a bit more? Either you're putting something between your desktop to the server (in which case Wireshark would work) or you're putting something between Cursor's infrastructure and their LLM provider, in which case, how?
we're doing the latter! Cursor lets you configure the OpenAI base URL so we were able to have Cursor call Ngrok -> Nginx (for auth) -> TensorZero -> LLMs. We explain in detail in the blog post.
1 reply →