Show HN: A MitM proxy to see what your LLM tools are sending

11 days ago (github.com)

I built this out of curiosity about what Claude Code was actually sending to the API. Turns out, watching your tokens tick up in real-time is oddly satisfying.

Sherlock sits between your LLM tools and the API, showing you every request with a live dashboard, and auto-saved copies of every prompt as markdown and json.

This tool looks like it unconditionally disables tls verification for upstream requests.

It shells out to mitmproxy with "--set", "ssl_insecure=true"

This took all of 5 minutes to find reading through main.py on my phone.

https://github.com/jmuncor/sherlock/blob/fb76605fabbda351828...

Edit: In case it’s not clear, you should not use this.

  • I think the main problem is when OP wrote:

    > I built this

    Instead of

    > I prompted this

    I am OK with people publishing new ideas to the web as long as they are 100% honest and admit they just had an idea and asked an AI to build it for themselves. That way I can understand they may not have considered all the things that needs to be considered and I can skip it (and then prompt it myself if I want to, adding all the things I consider necesary)

  • And it's already surpassed my most starred project when it was on GitHub, all the more validating to have moved it to forgejo. If vibecoded stuff with unbelievable security vulns can get so much praise the whole star system doesn't work as a quality filter. Similarly a well crafted README used to help reflect quality, no longer...

    • I don’t use stars to select dependencies FWIW. I look for age, CVEs and what other reputable projects depend on a repo. Also try to look for other signals, like if claims in the readme don’t match the implementation, or if there’s poor hygiene in the CI workflows. (And yes, I have gotten burned by an otherwise well meaning project with a supply chain vuln). As the saying goes “a little copying is better than a little dependency” (see: https://www.youtube.com/watch?v=PAAkCSZUG1c&t=9m28s).

  • The thing you want has a kind of academic jargon name (coeffects algebra with graded/indexed monads for discharge) but is very intuitive, and it can do useful and complete attestation without compromising anyone credentials (in the limit case because everyone chooses what proxy to run).

    https://imgur.com/a/Ztyw5x5

    • Sorry but you lost me. How are coeffects different from effects? I think I’m missing some steps between monads and credentials. Maybe fill in the blanks?

  • Just fixed it and implemented a simple http relay, eliminating the mitmproxy and the ssl_insecure=true. The new implementation uses TLS verification, doing last tests and merging it... After the merge can you check it out and tell me if I earned your star? :D

    • I’m not sure you fully understand the implications of the misconfiguration of mitmproxy there. Effectively you provided an easily accessible front door for remote code execution on a user’s machine.

      No offense, but I wouldn’t trust anything else you published.

      I think it’s great that you are learning and it is difficult to put yourself out there and publish code, but what you originally wrote had serious implications and could have caused real harm to users.

      6 replies →

    • >tell me if I earned your star

      Since you asked: Not in a million years, no.

      A bug of this type is either an honest typo or a sign that the author(s) don't take security seriously. Even if it were a typo, any serious author would've put a large FIXME right there when adding that line disabling verification. I know I would. In any case a huge red flag for a mitm tool.

      Seeing that it's vibe coded leads me believe it's due to AI slop, not a simple typo from debugging.

      8 replies →

    • You don't understand what you're doing, and never will. Throw away all computing devices you've got.

  • Don't use it if you plan to auto accept terminal commands, without a sandbox, while on a public wifi in a cafe, next to a hacker who decides to bet on you running a very niche configuration.

    • All you need is to manipulate DNS, inject a record with a long TTL and the rest is not required.

      It scales very well and I guarantee this is not the only instance of misconfigured host verification. In other words, this is not as niche as you might think.

      2 replies →

As someone who just set up mitmproxy to do something very similar, I wish this would've been a plugin/add-on instead of a standalone thing.

I know and trust mitmproxy. I'm warier and less likely to use a new, unknown tool that has such broad security/privacy implications. Especially these days with so many vibe-coded projects being released (no idea if that's the case here, but it's a concern I have nonetheless).

This is great.

When I work with AI on large, tricky code bases I try to do a collaboration where it hands off things to me that may result in large number of tokens (excess tool calls, unprecise searches, verbose output, reading large files without a range specified, etc.).

This will help narrow down exactly which to still handle manually to best keep within token budgets.

Note: "yourusername" in install git clone instructions should be replaced.

  • I've been trying to get token usage down by instructing Claude to stop being so verbose (saying what it's going to do beforehand, saying what it just did, spitting out pointless file trees) but it ignores my instructions. It could be that the model is just hard to steer away from doing that... or Anthropic want it to waste tokens so you burn through your usage quickly.

    • Simply assert that :

      you are a professional (insert concise occupation).

      Be terse.

      Skip the summary.

      Give me the nitty-gritty details.

      You can send all that using your AI client settings.

  • I had a similar problem, and when claude code (or codex) is running in sandbox, i wanted to put a cap or get notified on large contexts.

    especially, because once x0K words crossed, the output becomes worser.

    https://github.com/quilrai/LLMWatcher

    made this mac app for the same purpose. any thoughts would be appreciated

  • Would you mind sharing more details about how you do this? What do you add to your AI prompts to make it hand those tasks off to you?

  • Hahahah just fixed it, thank you so much!!!! Think of extending this to a prompt admin, Im sure there is a lot of trash that the system sends on every query, I think we can improve this.

Nice work! I'm sure the data gleaned here is illuminating for many users.

I'm surprised that there isn't a stronger demand for enterprise-wide tools like this. Yes, there are a few solutions, but when you contrast the new standard of "give everyone at the company agentic AI capabilities" with the prior paradigm of strong data governance (at least at larger orgs), it's a stark difference.

I think we're not far from the pendulum swinging back a bit. Not just because AI can't be used for everything, but because the governance on widespread AI use (without severely limiting what tools can actually do) is a difficult and ongoing problem.

  • I had to vibe code a proxy to hide tokens from agents (https://github.com/vladimirkras/prxlocal) because I haven’t found any good solution either. I planned to add genai otel stuff that could be piped into some tool to view dialogues and tool calls and so on, but I haven’t found any good setup that doesn’t require lots of manual coding yet. It’s really weird that there are no solutions in that space.

  • Yes, I was just thinking about how, as engineers, we're trained to document every thought that has ever crossed our minds, for liability and future reference. Yet once an LLM is done with its task, the "hit by a bus" scenario takes place immediately.

    • Yes, I think you can actually later store this in a database and start querying and optimizing what is happening there. Even you can start using these files or a destilation of these as long term memory.

You don't need to mess with certificates - you can point CC at a HTTP endpoint and it'll happily play along.

If you build a DIY proxy you can also mess with the prompt on the wire. Cut out portions of the system prompt etc. Or redirect it to a different endpoint based on specific conditions etc.

It’s actually really easy to use mitmproxy as a…proxy. You set it up as a SOCKS proxy (or whatever) and point your network or browser to the proxy. I did this recently when a python tool was too aggressive on crawling the web and the server would reject me. Forced my session to limit 5 requests per second and it worked rather than finding the exact file to change in the library. Just do the same to your browser and then turn on the capture mode and you’ll see the requests

I use litellm (slightly modified to allow cloud code telemetry pass through) and langfuse.

There is no need for MitM, you can set Api base address to your own proxy in all the coding assistants (at least all I know - Claude Code, opencode, gemini, vc plugin).

The changes I made allow use of the models endpoint in litellm at the same base url as telemetry and passing through Claude Max auth. This is not about using your Max with another cli tool, but about recording everything that happens.

There is a tool that can send CC json logs to langfuse but the results are much inferior. You loose parts of the tool call results, timing info etc.

I'm quite happy with this. If anyone is interested I can post a github link.

I usually have small mini-pc with at least two ethernet ports and configure it as a transparent bridge sitting between my desktop and the router/switch. Give the bridge a local IP, set up some packet inspection stuff, and you can easily monitor anything and everything going in and out. It's not all I use, but it's one part.

I also run ai models locally and like to verify that things aren't talking to the internet if they aren't supposed to be.

Activate controlled folder access and filesystem access to see what is trying to change every time loading and using a llm. Most LLM models are programmed to call home at first loading. Then the libs you are loading them with also log and smt looking to send bytes (check with firewall for details).

HugstonOne uses Enforced Offline policy/ Offline switch because of that. Our Users are so happy lately :) and will realize it clearly in the future.

Pretty slick. I've been wanting something like this that gets stored with a hash that is stored in the corresponding code change commit message. It'd be good for postmortems of unnoticed hallucinations, and might even be useful to "revive" the agent and see if it can help debug the problem it created.

So is it just a wrapper around MitM Proxy?

  • > So is it just a wrapper around MitM Proxy?

    Yes.

    I created something similar months ago [*] but using Envoy Proxy [1], mkcert [2], my own Go (golang) server, and Little Snitch [3]. It works quite well. I was the first person to notice that Codex CLI now sends telemetry to ab.chatgpt.com and other curiosities like that, but I never bothered to open-source my implementation because I know that anyone genuinely interested could easily replicate it in an afternoon with their favourite Agent CLI.

    [1] https://www.envoyproxy.io/

    [2] https://github.com/FiloSottile/mkcert

    [3] https://www.obdev.at/products/littlesnitch/

    [*] In reality, I created this something like 6 years ago, before LLMs were popular, originally as a way to inspect all outgoing HTTP(s) traffic from all the apps installed in my macOS system. Then, a few months ago, when I started using Codex CLI, I made some modifications to inspect Agent CLI calls too.

    • Curious to see how you can get Gemini fully intercepted.

      I've been intercepting its HTTP requests by running it inside a docker container with:

      -e HTTP_PROXY=http://127.0.0.1:8080 -e HTTPS_PROXY=http://host.docker.internal:8080 -e NO_PROXY=localhost,127.0.0.1

      It was working with mitmproxy for a very brief period, then the TLS handshake started failing and it kept requesting for re-authentication when proxied.

      You can get the whole auth flow and initial conversation starters using Burp Suite and its certificate, but the Gemini chat responses fail in the CLI, which I understand is due to how Burp handles HTTP2 (you can see the valid responses inside Burp Suite).

      4 replies →

  • Kind of yes... But with a nice cli so that you don't have to set it up just run "sherlock claude" and "sherlock start" on two terminals and everything that claude sends in that session then it will be stored. So no proxy set up or anything, just simple terminal commands. :)

Dang how will Tailscale make any money on its latest vibe coded feature [0] when others can vibe code it themselves? I guess your SaaS really is someones weekend vibe prompt.

[0]https://news.ycombinator.com/item?id=46782091

I understand this helps if we have our own LLM run time. What if we use external services like ChatGPT / Gemini (LLM Providers)? Shouldn't they provide this feature to all their clients out of the box?

  • This works with claude code and codex... So you can use with any of those, you dont need a local llm running... :)

Could you use an approach like this much like a traditional network proxy, to block or sanitise some requests?

E.g. if a request contains confidential information (whatever you define that to be), then block it?

  • I do kinda the opposite where I run my AI in a sandbox. it sends dummy tokens to APIs. the proxy then injects the real creds. so, the AI never has access to creds.

    https://clauderon.com/ -- not really ready for others to use it though

  • Forgot to mention: It’s a neat tool. Well done.

    • Thank you, what I was thinking was more along the lines of optimizing how you use your context window. So that the LLM can actually access what it needs to, like a incredibly powerful compact that runs in the background with your file system working as a long term memory... Still thinking how to make it work, so I am super open to ideas.

This is fantastic. Claude doesn't make it easy to inspect what it's sending - which would actually be really useful for refining the project-specific prompts.

  • Love you like it!! Let me know any ideas to improve it... I was thining in the direction of a file system and protocol for the md files, or dynamic context building. But would love to hear what you think.

Amusingly, I had the same question and asked Claude Code to vibe code me something similar. :)

  • Now you can add on top of it :D and we can all create something great :D

    • As is the case with most vibe coded software, it wasn't polished, didn't work very well, had lots of edge cases, and was pretty much bespoke to my one use case. :)

      It answered the question "what the heck is this software sending to the LLM" but that was about all it was good for.

      1 reply →

Nice work! Do i need to update Claude Code config after start this proxy service?

  • Nope... You just run "sherlock claude" and that sets up the proxy for you. So you dont have to think about it... And just use claude normally, every prompt you send in that session will be stored in the files.

Or we could just demand agents that offer this level of introspection?

  • I certainly wouldn't trust self-reporting on this

    • Not only trust, but how you later optimize what is in the context to cater how you use llms... There is a whole world to be explored inside that context window.

I built something similar after seeing this post: https://wiretaps.ai (repo: https://github.com/marcosgabbardo/wiretaps)

Different approach:

- No TLS verification bypass — works by setting OPENAI_BASE_URL

- Built-in PII detection (SSN, credit cards, emails, phone numbers across ~20 countries)

- Crypto detection (BTC/ETH addresses, private keys, seed phrases)

- SQLite by default, zero config: pip install wiretaps && wiretaps start

Still early (v0.3), but the PII detection is solid — 45+ regex patterns for global compliance (GDPR, LGPD, etc).

Would love feedback from folks here.

The amount of AI slop hitting the HN front page is getting out of hand. Then you open the comments and there are obvious LLM bots commenting on it.

Wonder if this is the end of HN.

Say it with me:

If I wanted an AI written tool for this, I would have prompted an AI, not opened HN.

lmao WTAF is this?

build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/build/lib/sherlock

  • That is what you would call vibe-ception... Hahahahah correcting it now! hahahahahahahaha!!