A most elegant TCP hole punching algorithm

16 hours ago (robertsdotpm.github.io)

81 comments

Uptrenda

Claimed elegance is based on a very bold assumption that the NAT device preserves the source port of outbound connection.

Hardly the case in even half of typical deployment cases.

taftster 2 hours ago

I like your comment, but it seems the author acknowledged this as a caveat to the algorithm.
>Many home routers try to preserve the source port in external mappings. This is a property called “equal delta mapping” – it won’t work on all routers but for our algorithm we’re sacrificing coverage for simplicity.
So to what percentage is this coverage sacrificed exactly? No idea. Not as useful if the percentage is high, as you are implying.

lxgr 11 hours ago

Does TCP hole punching actually work with common CPEs and CG-NATs?

I don’t think I’ve ever seen it done successfully and have often wondered if it’s for a lack of use cases or due to its bad success rate and complexity compared to UDP hole punching.

That said, I really wish there was a standardized way to do it. Some sort of explicit (or at least implicit but unambiguous) indicator to all firewalls that a connection from a given host/port pair is desired for the next few seconds. Basically a lightweight, in-band port mapping protocol.

It could have well been an official recommendation to facilitate TCP hole punching, but I guess it’s too late now, as firewall behaviors have had decades to evolve into different directions.

aboardRat4 4 hours ago
The standard way to do it is called ipv6. Implementing it is probably easier than any of those RFCs
- patrakov 3 hours ago
  
  No, it isn't. Many middleboxes (including OpenWrt by default) drop unsolicited inbound TCP connections even on IPv6, and therefore the same hole-punching algorithm is needed. The hole being punched is in the stateful firewall's connection tracker, not in the NAT. Basically, both parties need to convince their router that it is an outgoing connection initiated by them, not a prohibited-by-policy incoming connection.
  
  4 replies →
ignoramous 7 hours ago
> really wish there was a standardized way to do it. Some sort of explicit (or at least implicit but unambiguous) indicator to all firewalls that a connection from a given host/port pair is desired for the next few seconds
NAT Behavioural Requirements for Unicast UDP, https://datatracker.ietf.org/doc/html/rfc4787
NAT Behavioural Requirements for TCP, https://datatracker.ietf.org/doc/html/rfc5382
- lxgr 6 hours ago
  
  > NAT Behavioural Requirements for TCP
  TIL, thank you! I've been looking for this for quite a while after hearing it indirectly referenced recently, but only found host-side specifications for TCP simultaneous open.
  Do you happen to know if common firewalls and NATs support it? If they do, I really wonder why TCP hole punching isn't more common.

athrowaway3z 13 hours ago

- you know each others IP's (or have a way to signal it)

- can't decide on a port in the same message

- don't suffer from NAT port randomization

I'm not saying it will never happen, but the Venn diagram of this being the minimum complexity solution just doesn't seem very large?

Arch485 5 hours ago

I think many people know how to google "what is my IP" and send that to a friend, but don't necessarily know what a port is.
NAT randomization, I don't know. Depends on your setup, I guess.

EnigmaCurry 14 hours ago

> Many home routers try to preserve the source port in external mappings. This is a property called “equal delta mapping” – it won’t work on all routers but for our algorithm we’re sacrificing coverage for simplicity.

It is precisely this point that has flummoxed me when connecting my p2p wireguard config[1] with a friend that uses a pfsense router, no matter what we tried, pfsense always chooses a random source port.

But in the simple case this blog outlines, if both ends use the same source port, this method punches through 2 firewalls effortlessly:

[1] https://blog.rymcg.tech/blog/linux/wireguard_p2p/

hdgvhicv 9 hours ago

In my experience, Cisco ASA does source port persistence by default (when it can’t do it then it falls back to random), fortigates can do it (in various ways depending on version, although fallback method in the map-ports doesn’t work), juniper SRXs can’t, unless you guarentee a 1:1 map.
jonathanlydall 13 hours ago
Does your friend setting up port forwarding on their pfSense not help in your scenario?
- EnigmaCurry 13 hours ago
  
  Yes, that solves it completely. But the exercise we were trying to do was to do it without that.
  
  2 replies →
getcrunk 11 hours ago
[flagged]
- craftkiller 8 hours ago
  
  This is against the HN guidelines:
  > Don't post generated comments or AI-edited comments. HN is for conversation between humans.
  https://news.ycombinator.com/newsguidelines.html
- Boltgolt 9 hours ago
  
  We can all run this through our LLM if choice, why post this?
- lxgr 11 hours ago
  
  Did you validate this solution yourself?
  
  6 replies →

sholladay 13 hours ago

This is a great algorithm!

In this era where AI is eating away at how deterministic computers are, I really appreciate reading about an elegant solution to a real problem using deterministic logic.

CamelCaseCondo 10 hours ago
We still live in an age of deterministic computers. It’s the software that’s become fuzzy. (And since we’re on the subject: there’s no AI)
- sholladay 8 hours ago
  
  Yes, but a computer is just a paperweight without its software. Also, increasingly the hardware is being specifically designed and optimized for that non-deterministic software. The experience of using computers is changing and we’re still in the early days of that shift.
  Of course there’s still plenty of deterministic software you can run… for now.
  
  1 reply →
- mycall 4 hours ago
  
  data = code in the AI age. Fuzzy data = fuzzy code.
  Now combining AI with deterministic tool calling brings the best of both worlds.
- wolttam 5 hours ago
  
  > there’s no AI
  This is a theistic statement at this point, no?
  
  1 reply →

jcalvinowens 14 hours ago

If you're asking "where is the listener", you don't need one: https://datatracker.ietf.org/doc/html/rfc9293#simul_connect

cperciva 12 hours ago
RFCs may say that simultaneous connect must be allowed, but that doesn't mean that firewalls can't block it. Plenty of setups block incoming SYN,!ACK packets, and if both sides do that, the connection is never getting established.
- jcalvinowens 6 hours ago
  
  In my experience most consumer routers are dumber than you're assuming they are, and will DNAT any inbound TCP packet that matches the 4-tuple after seeing the initial outbound SYN, including an inbound SYN. But yes, it doesn't work everywhere.
  I wrote little paper on this technique in school and did some practical tests, at the time I was actually unable to find an example of consumer grade router that it didn't work on! But my resources were rather limited, they certainly do exist.
- huhtenberg 4 hours ago
  
  > Plenty of setups block incoming SYN,!ACK packets
  Even in the presence of a conntrack entry created by an earlier outbound SYN,!ACK ?
  Got a source?
  
  6 replies →

jder 3 hours ago

I don’t think the bucket-choosing algorithm works? The two hosts can be just on opposite sides of a bucket edge. For example if one host sees t=61 and another sees t=62, they will get different buckets despite being less than 20 seconds apart. You’ve got to check adjacent buckets within your error tolerance, not expand the bucket windows in size based on it.

melson 5 hours ago

I made a udp Windows wintun based p2p vpn tunnel https://github.com/mascarenhasmelson/Windows-P2P-UDP

ata-sesli 7 hours ago

The timestamp bucket idea for generating shared port candidates is clever.

Do you find this works reliably outside routers that preserve source ports? My understanding was that TCP punching tends to depend heavily on NAT behavior.

enoint 7 hours ago

Looks like a typo in the degraded timestamp “bucket”. That “window” value should be based on the min threshold.

Veserv 12 hours ago

Needing to punch holes in NAT is one of the most idiotic own-goals in the entire field of networking.

NAT is effectively your router doing DHCP with a 17-bit suffix (16-bit port + 1 bit for UDP vs TCP) to each of your applications and then not telling you the address it gave you or how long it is good for (which is what a regular DHCP lease does). This is in addition to it, most likely, already doing regular DHCP and allocating you a IP address that it does tell you about, but which is basically worthless since routing to just that prefix without the hidden suffix goes into a black hole.

If you could just ask your router for a lease on a chunk of IP+NAT addresses that you could allocate to your applications and rotate them as they expire, you would not need this horrifying mess.

The router would just need to maintain the last-leg routing table (what a concept, a router doing routing with routing tables) just like it already does DHCP.

The applications would have short-term stable addresses that they could just tell their peers and just directly tell the router/firewall to block anybody except the desired peer short-term address.

lxgr 11 hours ago
> If you could just ask your router for a lease on a chunk of IP+NAT addresses
The “just” is doing a lot of lifting there. I’m glad the various port mapping protocols didn’t really take off and it looks like IPv6 is going to actually make it instead. Much less complexity in most parts of the stack and network.
- Veserv 10 hours ago
  
  It is always a mystery how people just randomly misinterpret what I write. At literally no point did I mention port mapping.
  I am pointing out how the problem NAT “solves” is just dynamic address configuration. They have implemented a N+K bit address where the N-bit prefix is routed and allocated using IP and the low K-bits are routed and allocated like a custom fever dream.
  You can just do it all the same way instead of doing it differently and worse for the low bits.
  To be clear, the router should rewrite zero bits in the packet under the scheme I am describing just like how routers have no need to rewrite any bits when routing to a specific globally-routable IP address.
  You get a lease for a /N+K address. /N routes to your router which routes the last K bits just like normal as if it had a /N-M to a /N route. This is a generic description of homogenous hierarchical routing.
  
  5 replies →
- hrmtst93837 10 hours ago
  
  Assuming IPv6 kills NAT is optimistic, plenty of orgs still stack private addressing and firewalls on top.
  
  7 replies →
eptcyka 12 hours ago
Why not use plain IPv6 instead?
- TuxPowered 8 hours ago
  
  Even with IPv6 you still might have stateful firewalls allowing only for outbound connection at both ends (e.g. a CPE a.k.a. “WiFi router”) and to establish communication you’d need to punch a hole in those firewalls.
  
  2 replies →
- cbdevidal 11 hours ago
  
  V6 adoption has reached 46.82%[1]. So it is increasingly viable for this.
  [1] https://www.google.com/intl/en/ipv6/statistics.html
jeroenhd 4 hours ago
If only router manufacturers could be trusted to implement UPnP safely, then none I'd this bullshit would be necessary.
At least with IPv6 this crap becomes a little easier because you no longer have randomized source ports (which this article just ignores because some devices indeed maintain the same source port) and the IP address contains all the routing information you need. A simple simultaneous open is all you need.
- gzread 2 hours ago
  
  If you use UDP transport you don't even need to try to make it simultaneous.
takipsizad 12 hours ago

it's been already done ISPs just don't properly implement it (NAT-PMP and it's relatives)
littlestymaar 11 hours ago

Hole punching is doing exactly what you describe, just in a non-standardized way.
We could have a standard for doing that directly at the NAT box level instead of relying on a third party STUN server, it simply didn't happen (and in fairness, the benefits would be quite minimal).

sylware 7 hours ago

Dudes: IPv6, please, come on, meh.

ufocia 8 hours ago

Meh. "It is assumed another process will coordinate the running of this tool." Coordination is the crux of the problem for fast convergence. Otherwise you're stuck with an infinity cubed, hypercubed, or worse problem.

andrewmcwatters 8 minutes ago

[dead]

elophanto_agent 10 hours ago

[flagged]

mudkipdev 10 hours ago
This is an AI slop bot
- vntok 8 hours ago
  
  That's fine, it's pretty good slop and from the comments history even entertaining at times.
  > my grandmother had a cookie jar collection and I always thought it was weird until I realized she was basically running a primitive NFT gallery except the tokens were actually useful because they contained cookies