Httptap: View HTTP/HTTPS requests made by any Linux program

19 days ago (github.com)

The "How it was made" section of the README was not less interesting than the tool itself:

> The way we have set things up is that we live and practice together on a bit over a hundred acres of land. In the mornings and evenings we chant and meditate together, and for about one week out of every month we run and participate in a meditation retreat. The rest of the time we work together on everything from caring for the land, maintaining the buildings, cooking, cleaning, planning, fundraising, and for the past few years developing software together.

  • Reminds me of a quote from "Soul of a new machine":

    > During one period, when the microcode and logic were glitching at the nanosecond level, one of the overworked engineers departed the company, leaving behind a note on his terminal as his letter of resignation: "I am going to a commune in Vermont and will deal with no unit of time shorter than a season."

  • To be honest: This sounds like just another of the many many other yoga/spiritual cults that currently exist all over the western world.

    EDIT: typos and slight wording changes

    • I believe I grew up in a cult myself, and one of the things I've concluded from that experience, and from leaving it, is that everywhere is a cult. Humans have a tendency towards cult-ish life, and if the cult is big enough we just refer to it as "society". People were as afraid (more or less) to leave the cult I was at, as people are around me now when they consider doing anything that is out of the norm.

      By no mean am I trying to hint towards some conspiracy, or to say that all cults are equally bad (or good); Just to say that sometimes the word cult simply means "a less popular way of life than the one most people around me live by".

      9 replies →

  • > For the past few years we have been recording a lecture series called Buddhism for AI. It's about our efforts to design a religion (yes, a religion) based on Buddhism for consumption directly by AI systems. We actually feel this is very important work given the world situation.

    I think it's an indicator of just how weird the times we're currently living in really are, that this part actually makes perfect sense...

    (whether or not it's a good idea or will lead to the results they envision is another question)

    • You'd think that the people willing to talk to a chatbot would not be willing to discuss the self with any honesty, but I'm continually surprised by the world.

      3 replies →

  • I sadly assumed the first countryside photo was generated but I assume now it is real!

    The mix of tech and meditation would appeal to me. Maybe the idea does (actually doing it is probably hard!).

    It seems like a "Buddhist Recurse"

    • Yeah that photo is real! That's where I live!

      Yes, it's true, actually doing it is hard, but to be honest not as hard as a lot of other stuff (getting a phd for example, or goodness gracious buying a house in San Francisco). I love getting up early. I love living out in nature. I love chanting and eating meals together and making a version of Buddhism for AI systems!

      If you're interested in what it's like, we have written a bunch of very short few-paragraph stories about our time at MAPLE here: https://tales.monasticacademy.org/

      1 reply →

httptap is a process-scoped http tracer that you can run without root priveleges. You can run `httptap <command>` where <command> is a linux program and you get a trace of http/https requests and responses in standard output:

    httptap -- python -c "import requests; requests.get('https://monasticacademy.org')"
    ---> GET https://monasticacademy.org/
    <--- 308 https://monasticacademy.org/ (15 bytes)
    ---> GET https://www.monasticacademy.org/
    <--- 200 https://www.monasticacademy.org/ (5796 bytes)

It works by running <command> in an isolated network namespace. It has its own TCP/IP stack (for which it uses gVisor). It is not an HTTP proxy and so does not rely on <command> being configured to use an HTTP proxy. It decrypts TLS traffic by generating a CA on the fly. It won't install any iptables rules or make other global system changes.

  • Do you know if it's possible to get this working on macos? I believe Tailscale uses gvisor's tcp/ip lib (as their netstack lib) on macos for certain things.

  • can it modify requests or responses? with the current web getting increasingly user-hostile a need for tool like this was never more apparent

    especially if it doesn't require proxy configuration

    • > especially if it doesn't require proxy configuration

      It does require trusting a local CA, or apps away from the browser being configured not to validate CAs (or trust the new CA) if they don't push responsibility for that to the OS-level support.

      I'm not sure it would be a good idea for the non-technical public: teaching them how to setup trust for a custom CA and that it is sometimes a good thing to do, would lead to a new exploit route/tool for phishers and other black-hats because many users are too naively trusting or too convenience focussed to be appropriately careful. How many times have we seen people install spyware because of claims that it will remove spyware? It could also be abused by malicious ISPs, or be forced on other ISPs by governments “thinking of the children”.

      2 replies →

    • Agreed! So there isn't any interface for modifying requests/responses at present, but it's definitely possible given the underlying approach. If you consider [this line of code](https://github.com/monasticacademy/httptap/blob/main/http.go...) where you have an HTTP request parsed from the <command> that ran and are about to send it out to the public internet: you could modify the request (or the response that is received a few lines further) in just the way that you would modify a normal http.Request in Go.

      3 replies →

    • if the program doesn't pin certificates, you should be able to intercept them by telling your machine to trust a certificate authority of your own creation and performing a mitm attack on the process's traffic. if it does do certificate pinning, then it won't trust your home issued cert, and will refuse to send data through your proxy.

  • Did everyone forget about wireshark, which can totally be ran as non-root?

    https://blog.wireshark.org/2010/02/running-wireshark-as-you/

It's a genius idea to run the process in a isolated network namespace!

I'm more interested in the HTTPS part. I see that it sets some common environment variables [1] to instruct the program to use the CA bundle in the temporary directory. This seems to pose a similar issue like all the variants of `http_proxy`: the program may simply choose to ignore the variable.

I see it also mounts an overlay fs for `/etc/resolv.conf` [2]. Does it help if httptap mounts `/etc/ca-certificates` directory with the temporary CA bundle?

[1] https://github.com/monasticacademy/httptap/blob/cb92ee3acfb2...

[2] https://github.com/monasticacademy/httptap/blob/cb92ee3acfb2...

  • Thanks! But yep I agree, you're exactly right, it's ultimately... frustrating that there isn't really an agreed-upon or system-enforced way to specify CA roots to an arbitrary process.

    It's true that httptap mounts an overlay on /etc/resolv.conf. This is, as you'd expect, due to the also-sort-of-frustrating situation with respect to DNS resolution in which, like CA roots, there isn't a truly reliable way to tell an arbitrary process what DNS server to use, but /etc/resolv.conf is a pretty good bet. As soon as you put a process into a network namespace you have to provide it with DNS resolution because it can no longer access localhost:53, which is the systemd resolver, which is the most common setup now on desktop linux systems.

    I do think it might help to mount /etc/ca-certificates as an overlay. When I started looking into the structure of that directory I was kind of dismayed... it's incredibly inconsistent from one distro to the next. Still, it's doable. Interested in any knowledge you might be able to share about how to add a cert to that directory in a way that would be picked up by at least some TLS implementations.

    • It's a bit thin solution though, isn't it? As you say, it's dependent on both specific CA store and resolver behaviour. It's probably going to be robust enough on the most common SSL libraries, such as OpenSSL. But if we're going that route, why not just run the software against a patched SSL library which dumps the traffic?

      That also doesn't require any elevated privileges (as opposed to other methods of syscall interception) and is likely much easier to do. It has the added benefit of being robust against applications either pinning certificates outright or just being particular about serial numbers, client certificates, and anything like that.

      2 replies →

    • What if instead you bound your own DNS server to localhost:53 inside the network namespace? I suppose you'd still have to mess with /etc/resolv.conf in case it points to hardcoded public resolvers instead like mine does.

  • IMO there's no general solution to the HTTPS part that will work for all kinds of programs and the long tail of certificate pinning implementations.

    As a proof by counterexample, imagine malware that uses TLS for communication and goes to great lengths to obfuscate its compiled code. It could be a program that bundles a fixed set of CA certificates into its binary and never open any files on the filesystem. It can still create valid, secure TLS connections (at least for ~10 years or so, until most root CA certificates expire). TLS is all userspace and there's no guarantee that it uses OpenSSL (or any other common library), so you can't rely on hooking into specific OpenSSL functions either. If the server uses a self-signed certificate and the client accepts it for whatever reason, it's worse.

    With that said, it's definitely possible to handle 99% of the cases reliably with some work. That's better than nothing.

Using a TUN device for this is a really cool idea! And the "How it was made" section is one of the best things I've read in a Github README.

I'm building something called Subtrace [1] but it can intercept both incoming and outgoing requests automatically. Looks like we converged on the same interface for starting the program too lol [2]. Subtrace's purpose is kinda different from httptap's though (more observability / monitoring for cloud backend services, hence the emphasis on both incoming and outgoing). Also, it uses a different approach -- using Seccomp BPF to intercept the socket, connect, listen, accept, and ~10 other syscalls, all TCP connections get proxied through Subtrace. We then parse the HTTP requests out of the TCP stream and then show it to the user in the Chrome DevTools Network tab, which we repurposed to work in the browser like a regular webapp.

Any fun stories there from running programs under httptap? Who phones home the most?

[1] https://github.com/subtrace/subtrace

[2] https://docs.subtrace.dev/quickstart

  • Super cool! Connecting what you capture to Chrome DevTools is fascinating, as is using eBPF. Great work getting the devtools to run as a standalone web app. You won't believe it but I have a half-finished attempt of the same thing for the firefox network tab - in the "networktab" dir of the repo!

    Very cool project, would love to learn more and happy to chat more about it.

    • Thanks! Subtrace uses BPF, not eBPF :) I think eBPF could be made to work with the same approach, but there's a few differences:

      - eBPF requires root privileges or at least CAP_BPF. Subtrace uses seccomp_unotify [1], so it works even in unprivileged environments.

      - eBPF requires using eBPF maps as the data channel + weird restrictions in the code because of the eBPF verifier. IMO these two things make it way harder to work with for the kind of networking logic that both httptap and Subtrace have in userspace. Everything is perfectly possible, just harder to reason about and debug.

      >half-finished attempt of the same thing for the firefox network tab

      Hahahah this is incredible. Something something great minds.

      [1] https://man.archlinux.org/man/seccomp_unotify.2.en

Another tool that can be used by an unprivileged user for analysing network traffic is rootless Podman with Pasta.

Just add the podman run option

--network=pasta:--pcap,myfile.pcap

Pasta then records the network traffic into a PCAP file that could later be analysed.

I wrote a simple example where I used tshark to analyse the recorded PCAP file https://github.com/eriksjolund/podman-networking-docs?tab=re...

Why not use eBPF instead? Then you could see all http requests from all processes at once, including ones that are already running. Plus you wouldn't need to bother with TLS at all, just hook on e.g. write(2).

  • How would hooking on write(2) solve TLS? You'll be able to read and modify the ciphertext, but the process will never call write(2) with the plaintext bytes, so you can't actually read the HTTP request. You'll just see the encrypted bytes that go on the wire, but so does the NSA :)

    You need the kind of CA certificate trick that httptap uses. It comes with its own set of caveats (e.g. certificate pinning), but it can be made to work reliably in most practical scenarios.

    I've spent an unjustifiable amount of time thinking about this specific problem building Subtrace [1], so I'm genuinely very interested in a simpler / more elegant approach.

    [1] https://github.com/subtrace/subtrace

  • Unfortunately TLS happens inside the the application, not in the kernel, so using eBPF to hook syscalls to write won't help with TLS decryption.

    • It is quite simple to use eBPF with uprobes to hook library calls, for example: https://github.com/iovisor/bcc/blob/master/tools/sslsniff.py

      The downside is this doesn't work with anything not using OpenSSL, there are projects like https://github.com/gojue/ecapture which have interceptors for many common libraries, but the downside is that needs different code for each library.

      I think providing a TLS certificate is fine for the use cases of the tool; most tools won't be doing certificate pinning, but ecapture does support Android where this is more likely.

    • But read and write syscalls are used by the application to do I/O on the sockets before/after the encryption, which can be intercepted. Or you can attach uprobes directly to the TLS library's own functions.

  • Wouldn't this require root? A big "selling point" of httptap seems to be that precisely it doesn't require root.

    Anyway the more options we have, the better.

Neat! This will immediately be used by me to debug nginx configs. Currently I use curl -v and have to manually skim the output to figure out what's wrong, but this would immediately make redirect loops and other things apparent. Cool tool!

  • Very cool! Would love to hear how it goes, especially any features that would be useful in the context of real-world usage.

Very cool if you need a quick and dirty way to inspect the http/s call stack of an app. Personally prefer eBPF to get _everything_ but using this utility can help drill down what is important in the eBPF trace

This looks great!

The GitHub profile points to https://www.monasticacademy.org/about which I have no particular opinion on but it did leave me wondering what the connection is between their monastic training retreat and their projects on GitHub.

Edit: Oh, I didn’t go to the bottom of the readme https://github.com/monasticacademy/httptap?tab=readme-ov-fil...

  • Yeah, for other readers who are looking at this thread, the connection is just that this (httptap) is a Monastic Academy project, and what that means is that there is a group of people living on 123 acres in Vermont according to a fairly traditional Buddhist monastic structure (though we are not ordained monks), and during the day we work on a number of technology and non-technology projects together. The link to the readme that sevg posted above is a good overview:

    https://github.com/monasticacademy/httptap?tab=readme-ov-fil...

I really like their approach. other methods that might use something like LD_PRELOAD fail on statically linked ELF's, like golang binaries.

Amazing, such a great use of the gvisor userspace tcp/ip stack!

  • Yeah learning about gVisor was one of the most fun parts of this project. The gVisor devs are incredibly helpful. If you look through the gvisor-users mailing list you'll see me asking them various questions about this and they really helped out a bunch with some incredibly thorough answers.

    • Outstanding! Congratulations on writing such a wonderful project!

      I have a suggestion regarding the "How It Works" section. When reading it, I initially thought you had implemented your TCP/IP stack from scratch. Later, I discovered through the comments that you're using gVisor. Perhaps you might consider mentioning this explicitly in the documentation?

      As an interesting side note, gVisor's netstack is also used in the Tailscale client, enabling features like connecting a machine to multiple tailnets without requiring special privileges.

Mitmproxy v11.1 can do a similar thing

  • The downside to using mitmproxy for this is that mitmproxy uses eBPF which requires (temporary) root privileges to set up. This tool works without root access on most distros (you do need TUN write access).

    • That’s interesting. Tailscale userspace mode does not require tun write access as (I believe) is implemented within the process that runs the gvisor stack (tailscaled). I am wondering if httptap could use the same approach?

  • Yeah mitmproxy is great. The main difference with httptap is that it's an HTTP proxy server, so you have to configure your program to use a proxy server. When I wrote httptap I wanted to be able to run `httptap <command>` and see the httptraces right there in standard output. There is an absolute ton of cool things that mitmproxy can do that httptap is not even close to, like interactively modifying HTTP requests and such. Very cool project.

Sadly, certificate pinning/certificate transparency makes this not so useful for apps that want to enforce security.

It would be very interesting to get something that can actually hook into the most common ssl libraries and/or decryption functions, and tries to dump things on the fly. Sure it'll still be blocked if there's tampering detection, but at least it could give some real transparent insight on calls done by some apps at times.

  • In a few years there will be someone, probably much smarter than me, that figures out how to automatically bypass the checks for the certificate pins and solve 99% of the cases that don't work today.

This is something I’ve needed more than a handful of times - and each time slowly figured out a cobbled together solution with wireshark / pcap

Fun reading about the authors too.

Thank you for sharing !

  • Same here actually -- have wanted this from time to time for years. Finally sat down and worked it out.

This is really cool, thank you for sharing! We've built a similar feature for mitmproxy lately, but with different tradeoffs. Our approach does require root and we don't have automated certificate install (yet), but we don't require apps to run in a dedicated namespace (so you can capture already-running processes). Super awesome to see this now, excited to dive into the code and see how you do TCP reassembly etc. :)

  • Thank you! mitmproxy is fantastic - thanks for all the work that's gone into that project. Maybe we can get in touch and chat about all this stuff.

Whoa, great!

Questions:

- What's the performance impact?

- Does it allow payload/headers inspection?

  • Thanks!

    Haven't measured performance at all. However when I decided on the approach I looked at the performance benchmarks from tun2socks, which uses the same gVisor code that httptap uses, and it seems that pretty great performance is possible with that code. Still need to do the work of actually achieving comparable performance.

    Payloads and headers can be dumped with --body and --header respectively. There is an example in the readme of doing this (just search for --body) and I'll work more on documenting this in the coming days.

    • I see it supports .har, but .warc support would be amazing. It's the iso standard for web archives, which would give you the ability to replay archived websites via multiple tools.

      1 reply →

This is clever! I've been playing around with netns and TUN devices lately for a work project, and this idea is just so simple and clean.

  • Thanks! Yeah linux network namespaces are a powerhouse that we're only just starting to fully utilize (outside of containerization).

Really appreciate the "How it works" section in the README. In general I think it's great when projects give a high-level overview of the architecture and techniques involved, it provides an easy way for a newcomer to quickly grasp the fundamental workings of the project.

That's a great DX! I wonder if an alternative way is to just hook functions like read, and write, and other functions linked to the SSL libs. It is true that you should be aware of the SSL libs in place but OpenSSL is the most popular.

So if I have a Java program using the AWS libraries and I run it under this thing, it can decode the HTTPS AWS payloads going to Amazon?

How does that work with the AWS certs? How does the program not reject whatever this tool is doing to pull it off?

  • 1. Yes. The following commit taught httptap how to configure Java processes to use its CA cert:

    https://github.com/monasticacademy/httptap/commit/4288a89504...

    2. How it works is explained in the last two paragraphs of the "How It Works" section of the readme:

    > When a client makes an HTTPS request, it asks the server for evidence that it is who it says it is. If the server has a certificate signed by a certificate authority, it can use that certificate to prove that it is who it says it is. The client will only accept such a certificate if it trusts the certificate authority that signed the certificate. Operating systems, web browsers, and many other pieces of software come with a list of a few hundred certificate authorities that they trust. Many of these pieces of software have ways for users to add additional certificate authorities to this list. We make use of this.

    > When httptap starts, it creates a certificate authority (actually a private key plus a corresponding x509 certificate), writes it to a file on the filesystem visible only to the subprocess, and sets a few environment variables -- again only visible to the subprocess being run -- that add this certificate authority to the list of trusted certificate authorities. Since the subprocess trusts this certificate authority, and httptap holds the private key for the certificate authority, it can prove to the subprocess that it is the server which which the subprocess was trying to communicate. In this way we can read the plaintext HTTP requests.

Does this work with larger more complicated software like web browsers, skype, or discord?

I know I'd have to run firefox with --no-remote.

Very cool idea though, love tools with this sort of UX. I look forward to a V1 release in the future.

  • Thanks!

    I did try this with firefox but it doesn't work right now due to (I think) the user namespace messing with user IDs. I think I should be able to fix this, though. I will have to try it with other desktop apps soon too...

How can I run this as non-root? This is not obvious to me.

  • Based on how it works it cannot run as non-root even in principle. https://github.com/monasticacademy/httptap?tab=readme-ov-fil...

    Correction: the readme claims it will work without requiring root, but it does need to manage network namespaces, which afaik may only be available to root users depending on system configuration.

    > To run httptap you do not need to be the root user. ... It makes use of linux-specific system calls -- in particular network namespaces ...

Linux gets wireshark???

2025 will now definately be the year of the Linux desktop :-)

Which privileges are required? CAP_NET_ADMIN? Or nothing at all?

  • Nothing at all!

    You do need write access to /dev/net/tun. This is standard for all users for the distros that I've looked into, but it is ultimately a distro-specific thing.

    • I'm curious because in a Kubernetes environment, the privileges can be minimal, i.e. read only filesystem, running as nobody, empty filesystem, etc.

does not seem to support SOCKS proxies which I rely on

  • Interesting. Care to share any info about your setup? Would it be a matter of httptap reading a certain environment variable and then forwarding traffic to a SOCKS proxy?

    FWIW there is also the excellent tun2socks (https://github.com/xjasonlyu/tun2socks), which was a significant inspiration for this project, and is specifically designed to forward traffic from a TUN device to a SOCKS proxy.

Is this implementing TCP in userspace?

  • Yep. This is the first time I've mentioned this but there are actually two implementations of this in the codebase -- one uses gVisor and one is an incredibly bare-bones TCP implementation that I wrote myself in 550 lines of Go code (tcp.go). The home-grown one isn't used by default and it doesn't support much of TCP proper, but it actually works pretty well. You can use it with `--stack=homegrown`.