While WireGuard makes every sense for an FPGA due to its minimal design, I wonder why there isn't much interest in using QUIC as a modern tunneling protocol, especially for corporate use cases. QUIC already provides an almost complete WireGuard-alternative via its datagrams that can be easily combined with TUN devices and custom authentication schemes (e.g. mTLS, bearer tokens obtained via OAuth2 and OIDC authentication, etc...) to build your own VPN. While I am not sure about performance, at least when compared to kernel-mode WireGuard, since QUIC is obviously a more complex state machine that's running in userspace and it depends on the implementation and optimizations offered by the OS (e.g. GRO/GSO), QUIC isn't just a yet another tunneling protocol, it actually offers lots of benefits such as working well with dynamic endpoints with DNS instead of just using static IP addrs, it uses modern TLSv1.3 and therefore it's compliant with FIPS for example, it uses AES which can be accelerated by the underlying hardware (e.g. AES-NI), it currently has implementations in almost every major programming language, it can work well in the future with proxies and load balancers, you can bring your own custom, more fine-grained authentication scheme (e.g. bearer tokens, mTLS, etc...), it masquerades as just another QUIC/HTTP3 traffic that's used by almost all major websites now and therefore less susceptible to dropping by any nodes in between, and other less obvious benefits such as congestion control and PMTUD.
Why would anyone want to use a complex kludge like QUIC and be at the mercy of broken TLS libraries, when Wireguard implementations are ~ 5k LOC and easily auditable?
Have all the bugs in OpenSSL over the years taught us nothing?
FWIW QUIC enforces TLS 1.3 and modern crypto. A lot smaller surface area and far fewer foot-guns. Combined with memory safe TLS implementations in Go and Rust I think it's fair to say things have changed since the heartbleed days.
Why are you taking from people their will to experiment and design new stuff? Are they using your money or time? Is this just out of grumpiness, envy, condescension or what?
I've recently spent a bunch of time working on a mesh networking project that employs CONNECT-IP over QUIC [1].
There's a lot of benefits for sure, mTLS being a huge one (particularly when combined with ACME). For general purpose, spoke and hub VPN's tunneling over QUIC is a no-brainer. Trivial to combine with JWT bearer tokens etc. It's a neat solution that should be used more widely.
However there are downsides, and those downsides are primarily performance related. For a bunch of reasons, some just including poorly optimized library code, others involving relatively high message parsing/framing/coalescing/fragmenting costs, and userspace UDP overheads. On fat pipes today you'll struggle to get more than a few gbits of throughput @ 1500 MTU (which is plenty for internet browsing for sure).
For fat pipes and hardware/FPGA acceleration use cases, google probably has the most mature approach here with their datacenter transport PSP [2]. Basically a stripped down per flow IPsec. In-kernel IPsec has gotten a lot faster and more scalable in recent years with multicore/multiqueue support [3]. Internal benchmarking still shows IPsec on linux absolutely dominating performance benchmarks (throughput and latency).
For the mesh project we ended up pivoting to a custom offload friendly, kernel bypass (AF_XDP) dataplane inspired by IPsec/PSP/Geneve.
I'm available for hire btw, if you've got an interesting networking project and need a remote Go/Rust developer (contract/freelance) feel free to reach out!
The purpose of Wireguard is to be simple. The purpose of QUIC is to be compatible with legacy web junk. You don't use the second one unless you need the second one.
WireGuard-over-QUIC does not make any sense to me, this lowers performance and possibly the inner WireGuard MTUs. You can just replace WireGuard with QUIC altogether if you just want obfuscation.
I think with a comment like this you have absolutely no clue what is relevant for adoption.
Adoption is about offering something that is 1) correct 2) easy to install 3) has reasonable performance 4) stable.
Wireguard provides all of those. OpenVPN was not meeting criterium 1 even a few years ago and IMO, if it doesn't work after a decade of development, it's _never_ going to work.
Now, let's look at your comment, which is full of techno mumbo jumbo (don't worry, I know everything you talk about), doesn't even mention half of those.
I think an extremely naive, but popular position is that when someone comes out with some new tool that "works on their machine", that they assume that everyone else believes immediately that they are not just as stupid as everyone that came before them. This was even true for Wireguard, since Wireguard was _not_ bug free either. In fact, one could argue that Wireguard is still an amateur project despite it working stable for some of my systems.
The problem with software like Wireguard is that there is no incentive to actually make bug free software. If software always works and has all the required features, nobody will call the person or company associated with it anymore. When was the last time that the author of "grep" was recognized as a great programmer? Never. Now, I am not saying that grep is free of bugs, but I just took a fairly stable program as an example. An economy for software like SaaS has much better incentives in that regard (even though they often also do not reach bug free status). curl is also an excellent example of bug ridden software that an entire industry is using, while it is written by an amateur (that has no incentive whatsoever to produce something that doesn't need to have bugs fixed).
If humanity had somewhat more of a collective intelligence, a million people would come together and just all paid $100 to implement a wireguard replacement (possibly even using the same protocol) to perfection such that no new implementation would ever be needed and that would adapt to any hardware automatically. Instead we prefer to continue to fuck around with inferior shit all day long.
I think standards operate according to punctuated equilibrium so the market will only accept one new standard every ten years or so. I could imagine something like PQC causing a shift to QUIC in the future.
Quic is a corporate supported black hole. Corporations are anti-human. Its a wonder that there is still some freedom to make useful protocols on the internet and that people are nice enough to do that
Very cool project - hoping to see follow-up designs that can do more than 1Gbps per port!
I recently built a fully Layer2-transparent 25Gbps+ capable wireguard-based solution for LR fiber links at work based on Debian with COTS Zen4 machines and a purpose-tailored Linux kernel build - I'd be curious to know what an optimized FPGA can do compared to that.
Yes, Jumbo frames unlock a LOT of additional performance - which is exactly what we have and need on those links. Using a vanilla wg-bench[0] loopback-esque (really veths across network namespaces) setup on the machine, I get slightly more than 15Gbps sustained throughput.
Just to elaborate for others, MACSec is a standard (802.1ae) and runs at line rate. Something like a Juniper PTX10008 can run it at 400Gbps, and it’s just a feature you turn on for the port you’d be using for the link you want to protect anyway (PTXs are routers/switches, not security devices).
If I need to provide encryption on a DCI, I’m at least somewhat likely to have gear that can just do this with vendor support instead of needing to slap together some Linux based solution.
Unless, I suppose, there’s various layer 2 domains you’re stitching together with multiple L2 hops and you don’t control the ones in the middle. In which case I’d just get a different link where that isn’t true.
I can't think of a scenario where this is useful. They claim "Full-throttle, wire-speed hardware implementation of Wireguard VPN" but then go on implementing this on a board with a puny set of four 1 Gbps ports... The standard software implementation of Wireguard (Linux kernel) can already saturate Gbps links (wirespeed, check) and can even approach 10 Gbps on a mid-range CPU: https://news.ycombinator.com/item?id=42172082
If they had produced a platform with four 10 Gbps ports, then it would become interesting. But the whole hardware and bitstream would have to be redevelopped almost from scratch.
It's an educational project. No need to put it on blast over that. CE/EE students can buy a board for a couple hundred bucks and play around with this to learn.
A hypothetical ASIC implementation would beat a CPU rather soundly on a per watt and per dollar basis, which is why we have hardware acceleration for other protocols on high end network adaptors.
Personally, if I could buy a Wireguard appliance that was decent for the cost, I'd be interested in that. I ran a FreeBSD server in my closet to do similar things back in the day and don't feel the need to futz around with that again.
I agree that if the goal is to be educational, it's an excellent interesting project. But there is no need to make dishonest claims on their web page like "the software performance is far below the speed of wire"
There’s a strong air of grantware to it. The notion that it could be end-to-end auditable from the RTL up is interesting, though, and generally Wireguard performance will tank with a large routing table and small MTUs like you might suffer on a VPN endpoint server while this project seems to target line speed even at the absolute worst case routing x packets scenario.
Why would you even need dedicated hardware for just 40 Gb/s? That is within single-core decryption performance which should be the bottleneck for any halfway decent transport protocol. Are we talking 40 Gb/s at minimum packet size so you need to handle ~120 M packets/s?
Because the entire stack is auditable here. There's no Cisco backdoor, no Intel ME, no hidden malware from a zombie NPM package. It's all your hardware.
My dude: As far as I know, it's the first implementation of Wireguard in an FPGA.
It does not have to be all things for all people today. It can be improved. (And it appears to be open-source under a BSD license; anyone can begin making improvements immediately if they wish.)
Concepts like "This proof-of-concept wasn't explored with multiple 10Gbps ports! It is therefore imperfect and thus disinteresting!" are... dismaying, to say the least.
It would be an interesting effort if it only worked with two 10Mbps ports, just because of the new way in which it accomplishes the task.
I don't want to live in a world where the worth of all ideas is reduced a binary concept, where all things are either perfect or useless.
(Fortunately for me, I do not live in such a world that is as binary as that.)
bps are easy. packets per second is the crunch. Say you've got 64 bytes per packet, which would be a worst-case-scenario - you're down to 150Mpacket/sec. Sending one byte after another is the easy bit, the decisions are made per-packet.
Amusingly, a lot of people have always been convinced that doing 10 Gbps is impossible on VPN. I recall a two-year old post on /r/mikrotik where everyone was telling OP it was impossible with citations and sources of why but then it worked
This is conceptually interesting but seems quite a ways from a real end to end implementation - a bit of a smell of academic grantware that I hope can reach completion.
Fully available source from RTL up (although the license seems proprietary?) is very interesting from an audit standpoint, and 1G line speed performance, although easily achieved by any recent desktop hardware, is quite respectable in worst case scenarios (large routing table and small frames). The architecture makes sense (software managed handshakes configure a hardware packet pipeline). WireGuard really lacks acceleration in most contexts (newer Intel QAT supposedly can accelerate ChaCha20 but trying to figure out how one might actually make it work is truly mind bending), so it’s a pretty interesting place to do a hardware implementation.
The safe assumption to make when met with a contradiction in licensing would be to assume that the more restrictive license holds, no? Especially when the permissive license is a general repo-wide license and the restrictive license is specifically applied to certain files.
So for all intents and purposes, in my opinion, large parts of this Wireguard FPGA project are under this weird proprietary Chili Chips license. In fact, the license is so proprietary that the people who made this wireguard FPGA repository and made it visible to the public are seemingly in violation of it.
It puts us in a weird spot as well: I'm now the "holder of" a file and am obligated to keep all information within it confidential and to protect the file from disclosure. So I guess I can't share a link to the repo, since that would violate my obligation to protect the files within it from disclosure.
I would link to the files in question, but, well, that wouldn't protect them from disclosure now would it.
"With traditional solutions (such as OpenVPN / IPSec) starting to run out of steam" -- and then zero explanation or evidence of how that is true.
I can see an argument for IPSec. I haven't used that for many years. However, I see zero evidence that OpenVPN is "running out of steam" in any way shape or form.
I would be interested to know the reasoning behind this. Hopefully the sentiment isn't "this is over five years old so something newer must automatically be better". Pardon me if I am being too cynical, but I've just seen way too much of that recently.
Seems like you just haven’t been paying attention. Even commercial VPNs like PIA and others now use Wireguard instead of traditional VPN stacks. Tailscale and other companies in that space are starting to replace VPN stacks with Wireguard solutions.
The reasons are abundant, the main ones being performance is drastically better, security is easier to guarantee because the stack itself is smaller and simpler, and it’s significantly more configurable and easier to obtain the behavior you want.
I use and advocate for wireguard but I don't see it's adoption in bigger orgs, at least the ones I've worked in. Appreciate this situation will change over time, but it'll be a long tail.
OpenVPN makes SNAT relatively trivial, from what I can tell. So I can VPN into a network, use a node on the network as my exit node, and access other devices on that network, with source-based NAT set up on the exit node to make it appear as if my traffic is coming from the exit node.
Wireguard seems to make this much more difficult from what I can tell, though I don't know enough about networking to know if that's fundamental to wireguard or just a result on less mature tooling.
I wouldn't say they're running out of steam (they never had any) but OpenVPN was always poorly designed and engineered and IPSec has poor interop because there are so many options.
Unfortunately (luckily?) I don’t have enough knees about IPsec, but usually things make a lot more sense once you actually know the exact architecture and rationale behind it
IPSec isn’t running out of steam anytime soon. Every commercial firewall vendor uses it, and it’s mandatory in any federal government installation.
WireGuard isn’t certified for any federal installation that I’m aware of and I haven’t heard of any vendors willing to take on the work of getting it certified when its “superiority” is of limited relevance in an enterprise situation.
OpenVPN has both terrible configuration and performance compared to just about anything else. I've seen it really drop off to next to no usage both in companies and for personal use over the past few years as wireguard based solutions have replaced it.
Wireguard is slowly eating the space alive and thats a good thing.
Here's a very educational comparison between Wireguard, OpenVPN and IPSec. It shows how easy wireguard is to manage compared to the other solutions and measures and explains the noticeable differences in speed: https://www.youtube.com/watch?v=LmaPT7_T87g
Aside from Blackwire prococols, the sector for FPGA's that are in the AMD architectural framework, Xilinx acquisition is the tangential key-management software for VPN tunneling, which is contingent on whether ASIC [application-specific integrated circuits] can successfully test binaries.
I haven’t tinkered with an FPGA in years but this has my curiosity up. I’d love to separate the protocol handling from the routing and see how light (small of an FPGA, power efficiency) it could be made.
The routing isn’t interesting to me - but protecting low power IoT traffic certain is.
I’ll need someone more into this to break it down for me - how does VPN work on this and why do you need an FPGA version of it? Is this an internal VPN or one for connecting to the internet?
"VPN" is just virtual emulated network cables that you would use to connect your laptops to Wi-Fi routers. It's just so happens that a lot of companies use that word for a paid, cloud based Internet-over-Internet service. It's as if taxi companies called themselves "wheels" companies that whether you're referring to the physical object or the service had become ambiguous.
VPNs are normally processed in software, and that processing is usually multi-step. So latency, jitter, processing time per types of packets, etc can vary. This is FPGA based, and FPGA can run some algorithms and programs that can be implemented as chained conditions at fixed latency without relying on function calling in software. Presumably this is faster and more stable than software approaches thanks to that.
Just a guess but I assume that this is (or rather, would be, judging by the README this isn't past the planning stage) for IoT and the like.
If you want your device to connect to a VPN you need something to implement the protocol. Cycles are precious in the embedded world so you don't want to do it in your microcontroller. You might offload it to another uC in your design but at that point it might make sense to just use an FPGA and have this at the hardware(-ish) level.
You can think of this as a "network interface chip" but speaking Wireguard instead of plain IP.
You run the WireGuard app on your computer/phone, tap Connect, and it creates an encrypted tunnel to a small network box (the “FPGA gateway”) at your office or in the cloud. From then on, your apps behave as if you’re on the company network, even if you’re at home or traveling.
Why the FPGA box: Because software implementations are too slow and existing hardware implementations cost too much.
integration of some of the compute intensive bits into the nic itself. the reason to do it in hardware is to increase efficiency (or sometimes performance, although software/cpu wireguard is already pretty good). this could be baby steps towards lower power / miniaturized / efficient hardware that supports the wireguard protocol.
Wireguard is a protocol and program for making point-to-point VPN connections. It's notable because it's simple (compared to alternatives like OpenVPN), so simple it became a kernel module which made it very fast. These guys implemented it in an FPGA because they could.
Here's a dumb question, tangentially related, since they have a 10gig L2 switch mentioned... How come nobody (almost) makes L2 10gig switches? Ubiquiti has a 8port L2, that really seems to be it.
The last time I was checking (which was over 5 years ago now admittedly) there were no 10GbE switch options for reasonable prices. Juniper had good 16 port options with 1GbE interfaces at not crazy prices (which I have two of).
Going to 10GbE was many multiples of the 1GbE price. They just seemed way too expensive and were not dropping.
As it goes, maxing out 1GbE is fast enough for the sort of data and IOPS I send over my LAN. So 10GbE would probably have been overkill.
Do you mean like most vendors have moved onto faster port speeds? Mostly you can still use the slower 10G optics and the ports will clock down even if the nominal port speed is higher.
Not counting Cisco, juniper etc? Can probably get 32port 10G on eBay for cheap. There's also some on Amazon and AliExpress. And tons of white label options.
I think Wireguard is awesome and I use it exclusively.
That said, when traveling - on hotel wifi - for internet to work, TCP port 443 is always open, thus OpenVPN will always work if you run it on that port.
For Wireguard, there isn’t a reliable always-open UDP port. Port 123 or 53 could work sometimes, but it’s not as guaranteed.
For any other application though, Wireguard would be my first choice.
Yep, I really want to dote on wireguard and have contributed a little bit to it in its early years, but I've always found dsvpn to work at any cafe/hotel/hospital/etc. where I roam (except Sydney Airport - fuck their hostile wifi).
The format of the data inside the TCP stream is very simple. Each datagram is preceded with a 16 bit unsigned integer in big endian byte order, specifying the length of the datagram.
Performance would of course suffer but it's not likely that whichever service is blocking UDP is going to be offering high performance.
If you are doing it manually you can include two peers, one over UDP and one over TCP and prioritize traffic flow over the UDP one. Commercial VPN apps tend to handle that with "auto".
If you want to be fancy or you are confident that the UDP blocking service can offer high performance you can include a third peer using udp2raw: <https://github.com/wangyu-/udp2raw>
The reason why you may want to retain udp-over-tcp is that some sophisticated firewalls may block fake-TCP.
> For Wireguard, there isn’t a reliable always-open UDP port. Port 123 or 53 could work sometimes, but it’s not as guaranteed.
Couldn't you pipe it through something like udp2raw in those few cases? Probably performance would be worse/terrible, but then you say it's on hotel network so those tend to be terrible anyways.
SpiralHDL is so cool. There's been so so much consolidation in the semiconductor market, and that's scary. But it feels like there's such an amazing base of new open design systems to work from now, that getting new things started should be so possible! There's just a little too much gap in actually getting the Silicon Foundry model back up, things all a bit too encumbered still. Fingers crossed that chip making has its next day.
> However, the Blackwire hardware platform is expensive and priced out of reach of most educational institutions. Its gateware is written in SpinalHDL, a nice and powerfull but a niche HDL, which has not taken roots in the industry. While Blackwire is now released to open-source, that decision came from their financial hardship -- It was originaly meant for sale.
1. None of the commercial tools support them. All other HDLs compile to SV (or plain Verilog) and then you're wasting hours and hours debugging generated code. Not fun. Ask me how I know...
2. SV has an absolute mountain of features and other HDLs rarely come close. Especially when it comes to multi-clock designs (which are annoying and awkward but very common), and especially verification.
The only glimpse of hope I see on the horizon is Veryl, which hews close enough to SV that interop is going to be easy and the generated code is going to be very readable. Plus it's made by very experienced people. It's kind of the Typescript of SystemVerilog.
What are the benefits of SV for multi-clock design? I found migen (and amaranth) to be much nicer for multi-clock designs, providing a stdlib for CDCs and async FIFOs and keeping track of clock domains seperately from normal signals.
My issue with systemverilog is the multitude of implementation with widely varying degrees of support and little open source. Xsim poorly supports more advanced constructs and crashes with them, leaving you to figure out which part causes issues. Vivado only supports a subset. Toolchains for smaller FPGAs (lattice, chinese, ...) are much worse. The older Modelsim versions I used were also not great. You really have to figure out the basic common subset of all the tools and for synthesis, that basically leaves interfaces and logic . Interfaces are better than verilog, but much worse than equivalents in these neo-HDLs(?).
While tracing back compiled verilog is annoying, you are also only using one implementation of the HDL, without needing to battle multiple buggy, poorly documented implementation. There is only one, usually less buggy, poorly documented implementation.
SpinalHDL's multiple clock domain support via lexical scoping is excellent.
Save for things like SV interfaces (which are equivalently implemented in a far better way using Scala's type system), SpinalHDL can emit pretty much any Verilog you can imagine.
Tangentially related, I've experimented with Tailscale and Zerotier and, tho I guess they have different audiences, I prefer Zerotier for reliability. Tailscale gets borked by existing VPN config, breaking things on local networks. I like both but does anyone care to share their experiences or explain more in depth the uses / differences as they see it?
While WireGuard makes every sense for an FPGA due to its minimal design, I wonder why there isn't much interest in using QUIC as a modern tunneling protocol, especially for corporate use cases. QUIC already provides an almost complete WireGuard-alternative via its datagrams that can be easily combined with TUN devices and custom authentication schemes (e.g. mTLS, bearer tokens obtained via OAuth2 and OIDC authentication, etc...) to build your own VPN. While I am not sure about performance, at least when compared to kernel-mode WireGuard, since QUIC is obviously a more complex state machine that's running in userspace and it depends on the implementation and optimizations offered by the OS (e.g. GRO/GSO), QUIC isn't just a yet another tunneling protocol, it actually offers lots of benefits such as working well with dynamic endpoints with DNS instead of just using static IP addrs, it uses modern TLSv1.3 and therefore it's compliant with FIPS for example, it uses AES which can be accelerated by the underlying hardware (e.g. AES-NI), it currently has implementations in almost every major programming language, it can work well in the future with proxies and load balancers, you can bring your own custom, more fine-grained authentication scheme (e.g. bearer tokens, mTLS, etc...), it masquerades as just another QUIC/HTTP3 traffic that's used by almost all major websites now and therefore less susceptible to dropping by any nodes in between, and other less obvious benefits such as congestion control and PMTUD.
Why would anyone want to use a complex kludge like QUIC and be at the mercy of broken TLS libraries, when Wireguard implementations are ~ 5k LOC and easily auditable?
Have all the bugs in OpenSSL over the years taught us nothing?
FWIW QUIC enforces TLS 1.3 and modern crypto. A lot smaller surface area and far fewer foot-guns. Combined with memory safe TLS implementations in Go and Rust I think it's fair to say things have changed since the heartbleed days.
2 replies →
"Have all the bugs in OpenSSL over the years taught us nothing?"
TweetNaCL to the rescue.
Why are you taking from people their will to experiment and design new stuff? Are they using your money or time? Is this just out of grumpiness, envy, condescension or what?
I've recently spent a bunch of time working on a mesh networking project that employs CONNECT-IP over QUIC [1].
There's a lot of benefits for sure, mTLS being a huge one (particularly when combined with ACME). For general purpose, spoke and hub VPN's tunneling over QUIC is a no-brainer. Trivial to combine with JWT bearer tokens etc. It's a neat solution that should be used more widely.
However there are downsides, and those downsides are primarily performance related. For a bunch of reasons, some just including poorly optimized library code, others involving relatively high message parsing/framing/coalescing/fragmenting costs, and userspace UDP overheads. On fat pipes today you'll struggle to get more than a few gbits of throughput @ 1500 MTU (which is plenty for internet browsing for sure).
For fat pipes and hardware/FPGA acceleration use cases, google probably has the most mature approach here with their datacenter transport PSP [2]. Basically a stripped down per flow IPsec. In-kernel IPsec has gotten a lot faster and more scalable in recent years with multicore/multiqueue support [3]. Internal benchmarking still shows IPsec on linux absolutely dominating performance benchmarks (throughput and latency).
For the mesh project we ended up pivoting to a custom offload friendly, kernel bypass (AF_XDP) dataplane inspired by IPsec/PSP/Geneve.
I'm available for hire btw, if you've got an interesting networking project and need a remote Go/Rust developer (contract/freelance) feel free to reach out!
1. https://www.rfc-editor.org/rfc/rfc9484.html
2. https://cloud.google.com/blog/products/identity-security/ann...
3. https://netdevconf.info/0x17/docs/netdev-0x17-paper54-talk-s...
Is quic related to the Chrome implemented WebTransport? Seems pretty cool to have that in browser API.
1 reply →
MASQUE[0] is the protocol for this. Cloudflare already uses masque instead of wireguard in their warp vpn.
[0]https://datatracker.ietf.org/wg/masque/about/
i was curious about this and did some digging around for an open source implementation. this is what i found: https://github.com/iselt/masque-vpn
The purpose of Wireguard is to be simple. The purpose of QUIC is to be compatible with legacy web junk. You don't use the second one unless you need the second one.
QUIC isn't really about the web, it's more of a TCP+TLS replacement on top of UDP. You can build your own custom L7 on top of QUIC.
14 replies →
What legacy junk is QUIC compatible with? It doesn’t include anything HTTP-related at all. It’s just an encrypted transport layer.
1 reply →
Mullvad offers exactly the combination of wireguard in QUIC for obsfucation and to make traffic look like Https -- https://mullvad.net/en/blog/introducing-quic-obfuscation-for...
WireGuard-over-QUIC does not make any sense to me, this lowers performance and possibly the inner WireGuard MTUs. You can just replace WireGuard with QUIC altogether if you just want obfuscation.
5 replies →
See also Obscura's approach of QUIC bridges to Mullvad as a privacy layer: https://obscura.net/blog/bootstrapping-trust/
I think with a comment like this you have absolutely no clue what is relevant for adoption.
Adoption is about offering something that is 1) correct 2) easy to install 3) has reasonable performance 4) stable.
Wireguard provides all of those. OpenVPN was not meeting criterium 1 even a few years ago and IMO, if it doesn't work after a decade of development, it's _never_ going to work.
Now, let's look at your comment, which is full of techno mumbo jumbo (don't worry, I know everything you talk about), doesn't even mention half of those.
I think an extremely naive, but popular position is that when someone comes out with some new tool that "works on their machine", that they assume that everyone else believes immediately that they are not just as stupid as everyone that came before them. This was even true for Wireguard, since Wireguard was _not_ bug free either. In fact, one could argue that Wireguard is still an amateur project despite it working stable for some of my systems.
The problem with software like Wireguard is that there is no incentive to actually make bug free software. If software always works and has all the required features, nobody will call the person or company associated with it anymore. When was the last time that the author of "grep" was recognized as a great programmer? Never. Now, I am not saying that grep is free of bugs, but I just took a fairly stable program as an example. An economy for software like SaaS has much better incentives in that regard (even though they often also do not reach bug free status). curl is also an excellent example of bug ridden software that an entire industry is using, while it is written by an amateur (that has no incentive whatsoever to produce something that doesn't need to have bugs fixed).
If humanity had somewhat more of a collective intelligence, a million people would come together and just all paid $100 to implement a wireguard replacement (possibly even using the same protocol) to perfection such that no new implementation would ever be needed and that would adapt to any hardware automatically. Instead we prefer to continue to fuck around with inferior shit all day long.
> When was the last time that the author of "grep" was recognized as a great programmer? Never.
Ken Thompson wrote grep, and he is definitely recognised as such.
1 reply →
I think standards operate according to punctuated equilibrium so the market will only accept one new standard every ten years or so. I could imagine something like PQC causing a shift to QUIC in the future.
Quic is a corporate supported black hole. Corporations are anti-human. Its a wonder that there is still some freedom to make useful protocols on the internet and that people are nice enough to do that
Very cool project - hoping to see follow-up designs that can do more than 1Gbps per port!
I recently built a fully Layer2-transparent 25Gbps+ capable wireguard-based solution for LR fiber links at work based on Debian with COTS Zen4 machines and a purpose-tailored Linux kernel build - I'd be curious to know what an optimized FPGA can do compared to that.
How did you work around WireGuard's encryption and multiqueue bottlenecks? Jumbo frames?
25G is a lot for WireGuard [1].
1. https://www.youtube.com/watch?v=oXhNVj80Z8A
Yes, Jumbo frames unlock a LOT of additional performance - which is exactly what we have and need on those links. Using a vanilla wg-bench[0] loopback-esque (really veths across network namespaces) setup on the machine, I get slightly more than 15Gbps sustained throughput.
[0]: https://github.com/cyyself/wg-bench
Its probably a 48port switch and that's a backplane claim.
When macsec exists?
No kidding.
Just to elaborate for others, MACSec is a standard (802.1ae) and runs at line rate. Something like a Juniper PTX10008 can run it at 400Gbps, and it’s just a feature you turn on for the port you’d be using for the link you want to protect anyway (PTXs are routers/switches, not security devices).
If I need to provide encryption on a DCI, I’m at least somewhat likely to have gear that can just do this with vendor support instead of needing to slap together some Linux based solution.
Unless, I suppose, there’s various layer 2 domains you’re stitching together with multiple L2 hops and you don’t control the ones in the middle. In which case I’d just get a different link where that isn’t true.
2 replies →
Yeah that would have been great, but it's not available on our existing core switches (Dell PowerSwitch S5200 series).
> When macsec exists?
When you say "exists" ... is there an OpenSource high-quality implementation ?
1 reply →
This is a flex!
A open source stack for Xilinx 7 chips is the most interesting take away for me here. I have to dig deeper
Project page: https://nlnet.nl/project/KlusterLab-Wireguard/
I can't think of a scenario where this is useful. They claim "Full-throttle, wire-speed hardware implementation of Wireguard VPN" but then go on implementing this on a board with a puny set of four 1 Gbps ports... The standard software implementation of Wireguard (Linux kernel) can already saturate Gbps links (wirespeed, check) and can even approach 10 Gbps on a mid-range CPU: https://news.ycombinator.com/item?id=42172082
If they had produced a platform with four 10 Gbps ports, then it would become interesting. But the whole hardware and bitstream would have to be redevelopped almost from scratch.
It's an educational project. No need to put it on blast over that. CE/EE students can buy a board for a couple hundred bucks and play around with this to learn.
A hypothetical ASIC implementation would beat a CPU rather soundly on a per watt and per dollar basis, which is why we have hardware acceleration for other protocols on high end network adaptors.
Personally, if I could buy a Wireguard appliance that was decent for the cost, I'd be interested in that. I ran a FreeBSD server in my closet to do similar things back in the day and don't feel the need to futz around with that again.
I agree that if the goal is to be educational, it's an excellent interesting project. But there is no need to make dishonest claims on their web page like "the software performance is far below the speed of wire"
There’s a strong air of grantware to it. The notion that it could be end-to-end auditable from the RTL up is interesting, though, and generally Wireguard performance will tank with a large routing table and small MTUs like you might suffer on a VPN endpoint server while this project seems to target line speed even at the absolute worst case routing x packets scenario.
what do you mean by grantware?
2 replies →
I can see this as a hardened VPN in a mission-critical deployment, which could not be as easily compromised as a software stack.
Why would you even need dedicated hardware for just 40 Gb/s? That is within single-core decryption performance which should be the bottleneck for any halfway decent transport protocol. Are we talking 40 Gb/s at minimum packet size so you need to handle ~120 M packets/s?
Because the entire stack is auditable here. There's no Cisco backdoor, no Intel ME, no hidden malware from a zombie NPM package. It's all your hardware.
My dude: As far as I know, it's the first implementation of Wireguard in an FPGA.
It does not have to be all things for all people today. It can be improved. (And it appears to be open-source under a BSD license; anyone can begin making improvements immediately if they wish.)
Concepts like "This proof-of-concept wasn't explored with multiple 10Gbps ports! It is therefore imperfect and thus disinteresting!" are... dismaying, to say the least.
It would be an interesting effort if it only worked with two 10Mbps ports, just because of the new way in which it accomplishes the task.
I don't want to live in a world where the worth of all ideas is reduced a binary concept, where all things are either perfect or useless.
(Fortunately for me, I do not live in such a world that is as binary as that.)
IMO it would be cool if they added Wireguard to Corundum but it would be expensive enough that they wouldn't get any hobbyist cred.
If a PC can do 10Gbps, are there any cycles left for other stuff?
bps are easy. packets per second is the crunch. Say you've got 64 bytes per packet, which would be a worst-case-scenario - you're down to 150Mpacket/sec. Sending one byte after another is the easy bit, the decisions are made per-packet.
Amusingly, a lot of people have always been convinced that doing 10 Gbps is impossible on VPN. I recall a two-year old post on /r/mikrotik where everyone was telling OP it was impossible with citations and sources of why but then it worked
https://old.reddit.com/r/mikrotik/comments/112mo4v/is_there_...
Mikrotik's hardware often can't even do linespeed beyond basic switching, not to mention VPN, so yeah.
2 replies →
They're discussing mikrotik hardware specifically? Enterprise stuff or a powerful server can easily do it.
It's highly going to depend on the hardware in use.
This is conceptually interesting but seems quite a ways from a real end to end implementation - a bit of a smell of academic grantware that I hope can reach completion.
Fully available source from RTL up (although the license seems proprietary?) is very interesting from an audit standpoint, and 1G line speed performance, although easily achieved by any recent desktop hardware, is quite respectable in worst case scenarios (large routing table and small frames). The architecture makes sense (software managed handshakes configure a hardware packet pipeline). WireGuard really lacks acceleration in most contexts (newer Intel QAT supposedly can accelerate ChaCha20 but trying to figure out how one might actually make it work is truly mind bending), so it’s a pretty interesting place to do a hardware implementation.
> (although the license seems proprietary?)
Hm, "BSD 3-Clause License" is seems really proprietary to you?
But you are right: do the personal license in many(most?) Verilog files[1] overrules the LICENSE file[2] of a repo?
[1] https://github.com/chili-chips-ba/wireguard-fpga/blob/main/1...
[2] https://github.com/chili-chips-ba/wireguard-fpga/blob/main/L...
The safe assumption to make when met with a contradiction in licensing would be to assume that the more restrictive license holds, no? Especially when the permissive license is a general repo-wide license and the restrictive license is specifically applied to certain files.
So for all intents and purposes, in my opinion, large parts of this Wireguard FPGA project are under this weird proprietary Chili Chips license. In fact, the license is so proprietary that the people who made this wireguard FPGA repository and made it visible to the public are seemingly in violation of it.
It puts us in a weird spot as well: I'm now the "holder of" a file and am obligated to keep all information within it confidential and to protect the file from disclosure. So I guess I can't share a link to the repo, since that would violate my obligation to protect the files within it from disclosure.
I would link to the files in question, but, well, that wouldn't protect them from disclosure now would it.
"With traditional solutions (such as OpenVPN / IPSec) starting to run out of steam" -- and then zero explanation or evidence of how that is true.
I can see an argument for IPSec. I haven't used that for many years. However, I see zero evidence that OpenVPN is "running out of steam" in any way shape or form.
I would be interested to know the reasoning behind this. Hopefully the sentiment isn't "this is over five years old so something newer must automatically be better". Pardon me if I am being too cynical, but I've just seen way too much of that recently.
Seems like you just haven’t been paying attention. Even commercial VPNs like PIA and others now use Wireguard instead of traditional VPN stacks. Tailscale and other companies in that space are starting to replace VPN stacks with Wireguard solutions.
The reasons are abundant, the main ones being performance is drastically better, security is easier to guarantee because the stack itself is smaller and simpler, and it’s significantly more configurable and easier to obtain the behavior you want.
I use and advocate for wireguard but I don't see it's adoption in bigger orgs, at least the ones I've worked in. Appreciate this situation will change over time, but it'll be a long tail.
9 replies →
OpenVPN makes SNAT relatively trivial, from what I can tell. So I can VPN into a network, use a node on the network as my exit node, and access other devices on that network, with source-based NAT set up on the exit node to make it appear as if my traffic is coming from the exit node.
Wireguard seems to make this much more difficult from what I can tell, though I don't know enough about networking to know if that's fundamental to wireguard or just a result on less mature tooling.
1 reply →
I wouldn't say they're running out of steam (they never had any) but OpenVPN was always poorly designed and engineered and IPSec has poor interop because there are so many options.
Unfortunately (luckily?) I don’t have enough knees about IPsec, but usually things make a lot more sense once you actually know the exact architecture and rationale behind it
1 reply →
IPSec isn’t running out of steam anytime soon. Every commercial firewall vendor uses it, and it’s mandatory in any federal government installation.
WireGuard isn’t certified for any federal installation that I’m aware of and I haven’t heard of any vendors willing to take on the work of getting it certified when its “superiority” is of limited relevance in an enterprise situation.
Interestingly tried out just now on one of my devices and Wireguard VPN speed was 5x faster on same configuration to OpenVPN.
OpenVPN has both terrible configuration and performance compared to just about anything else. I've seen it really drop off to next to no usage both in companies and for personal use over the past few years as wireguard based solutions have replaced it.
Same here. With openvpn my somewhat modern cpu takes out a whole core @100% at like 200 megabits/s.
With WireGuard I instead max out the internet bandwidth (400 megabits/s) with like 20% cpu usage if that.
I really don’t understand why. We have AES acceleration. AES-NI can easily do more bps… why is openvpn so slow?
Wireguard is slowly eating the space alive and thats a good thing.
Here's a very educational comparison between Wireguard, OpenVPN and IPSec. It shows how easy wireguard is to manage compared to the other solutions and measures and explains the noticeable differences in speed: https://www.youtube.com/watch?v=LmaPT7_T87g
Very recommended!
Aside from Blackwire prococols, the sector for FPGA's that are in the AMD architectural framework, Xilinx acquisition is the tangential key-management software for VPN tunneling, which is contingent on whether ASIC [application-specific integrated circuits] can successfully test binaries.
I haven’t tinkered with an FPGA in years but this has my curiosity up. I’d love to separate the protocol handling from the routing and see how light (small of an FPGA, power efficiency) it could be made.
The routing isn’t interesting to me - but protecting low power IoT traffic certain is.
Wow, it’s crazy how much thought goes into these VPN designs.
This is a very cool project! I had never heard of SystemVerilog until today.
I’ll need someone more into this to break it down for me - how does VPN work on this and why do you need an FPGA version of it? Is this an internal VPN or one for connecting to the internet?
This part of the README answers the “why” pretty well:
> Both software and hardware implementations of Wireguard already exist. However, the software performance is far below the speed of wire.
> Existing hardware approaches are both prohibitively expensive and based on proprietary, closed-source IP blocks and tools.
> The intent of this project is to bridge these gaps with an FPGA open-source implementation of Wireguard, written in SystemVerilog HDL.
So having it on an FPGA gives you the best of both worlds, speed of a hardware implementation without the concerns of a proprietary black box.
"VPN" is just virtual emulated network cables that you would use to connect your laptops to Wi-Fi routers. It's just so happens that a lot of companies use that word for a paid, cloud based Internet-over-Internet service. It's as if taxi companies called themselves "wheels" companies that whether you're referring to the physical object or the service had become ambiguous.
VPNs are normally processed in software, and that processing is usually multi-step. So latency, jitter, processing time per types of packets, etc can vary. This is FPGA based, and FPGA can run some algorithms and programs that can be implemented as chained conditions at fixed latency without relying on function calling in software. Presumably this is faster and more stable than software approaches thanks to that.
Just a guess but I assume that this is (or rather, would be, judging by the README this isn't past the planning stage) for IoT and the like.
If you want your device to connect to a VPN you need something to implement the protocol. Cycles are precious in the embedded world so you don't want to do it in your microcontroller. You might offload it to another uC in your design but at that point it might make sense to just use an FPGA and have this at the hardware(-ish) level.
You can think of this as a "network interface chip" but speaking Wireguard instead of plain IP.
Not a member of the project but here is my take:
You run the WireGuard app on your computer/phone, tap Connect, and it creates an encrypted tunnel to a small network box (the “FPGA gateway”) at your office or in the cloud. From then on, your apps behave as if you’re on the company network, even if you’re at home or traveling.
Why the FPGA box: Because software implementations are too slow and existing hardware implementations cost too much.
Internal or Internet: Both.
integration of some of the compute intensive bits into the nic itself. the reason to do it in hardware is to increase efficiency (or sometimes performance, although software/cpu wireguard is already pretty good). this could be baby steps towards lower power / miniaturized / efficient hardware that supports the wireguard protocol.
also just a fun project for the authors. :)
Wireguard is a protocol and program for making point-to-point VPN connections. It's notable because it's simple (compared to alternatives like OpenVPN), so simple it became a kernel module which made it very fast. These guys implemented it in an FPGA because they could.
Here's a dumb question, tangentially related, since they have a 10gig L2 switch mentioned... How come nobody (almost) makes L2 10gig switches? Ubiquiti has a 8port L2, that really seems to be it.
Do you mean specifically as consumer products?
There are loads of 10GbE switches from Cisco/Juniper/Arista/et al.
I'd guess so.
The last time I was checking (which was over 5 years ago now admittedly) there were no 10GbE switch options for reasonable prices. Juniper had good 16 port options with 1GbE interfaces at not crazy prices (which I have two of).
Going to 10GbE was many multiples of the 1GbE price. They just seemed way too expensive and were not dropping.
As it goes, maxing out 1GbE is fast enough for the sort of data and IOPS I send over my LAN. So 10GbE would probably have been overkill.
5 replies →
Mikrotik has quite a few, I've been happily using CRS306 and CRS312 for some years now.
Do you mean like most vendors have moved onto faster port speeds? Mostly you can still use the slower 10G optics and the ports will clock down even if the nominal port speed is higher.
Not counting Cisco, juniper etc? Can probably get 32port 10G on eBay for cheap. There's also some on Amazon and AliExpress. And tons of white label options.
I think Wireguard is awesome and I use it exclusively.
That said, when traveling - on hotel wifi - for internet to work, TCP port 443 is always open, thus OpenVPN will always work if you run it on that port.
For Wireguard, there isn’t a reliable always-open UDP port. Port 123 or 53 could work sometimes, but it’s not as guaranteed.
For any other application though, Wireguard would be my first choice.
Yep, I really want to dote on wireguard and have contributed a little bit to it in its early years, but I've always found dsvpn to work at any cafe/hotel/hospital/etc. where I roam (except Sydney Airport - fuck their hostile wifi).
[dsvpn]: https://github.com/jedisct1/dsvpn
Some VPN applications provide the means by which to tunnel WG over TCP. Some provide those as standalone tools: <https://github.com/mullvad/udp-over-tcp>
The one above has a very simple protocol:
Performance would of course suffer but it's not likely that whichever service is blocking UDP is going to be offering high performance.
If you are doing it manually you can include two peers, one over UDP and one over TCP and prioritize traffic flow over the UDP one. Commercial VPN apps tend to handle that with "auto".
If you want to be fancy or you are confident that the UDP blocking service can offer high performance you can include a third peer using udp2raw: <https://github.com/wangyu-/udp2raw>
The reason why you may want to retain udp-over-tcp is that some sophisticated firewalls may block fake-TCP.
QUIC will hopefully help with this.
> For Wireguard, there isn’t a reliable always-open UDP port. Port 123 or 53 could work sometimes, but it’s not as guaranteed.
Couldn't you pipe it through something like udp2raw in those few cases? Probably performance would be worse/terrible, but then you say it's on hotel network so those tend to be terrible anyways.
SpiralHDL is so cool. There's been so so much consolidation in the semiconductor market, and that's scary. But it feels like there's such an amazing base of new open design systems to work from now, that getting new things started should be so possible! There's just a little too much gap in actually getting the Silicon Foundry model back up, things all a bit too encumbered still. Fingers crossed that chip making has its next day.
> However, the Blackwire hardware platform is expensive and priced out of reach of most educational institutions. Its gateware is written in SpinalHDL, a nice and powerfull but a niche HDL, which has not taken roots in the industry. While Blackwire is now released to open-source, that decision came from their financial hardship -- It was originaly meant for sale.
Here's some kind of link for the old BlackWire 100Gbe wiregaurd project mentioned: https://github.com/FPGA-House-AG/BlackwireSpinal
Amusingly, after the commentaries about niche HDLs, the authors seem to have turned to PipelineC in this project.
The problems with all not-SV HDLs are:
1. None of the commercial tools support them. All other HDLs compile to SV (or plain Verilog) and then you're wasting hours and hours debugging generated code. Not fun. Ask me how I know...
2. SV has an absolute mountain of features and other HDLs rarely come close. Especially when it comes to multi-clock designs (which are annoying and awkward but very common), and especially verification.
The only glimpse of hope I see on the horizon is Veryl, which hews close enough to SV that interop is going to be easy and the generated code is going to be very readable. Plus it's made by very experienced people. It's kind of the Typescript of SystemVerilog.
What are the benefits of SV for multi-clock design? I found migen (and amaranth) to be much nicer for multi-clock designs, providing a stdlib for CDCs and async FIFOs and keeping track of clock domains seperately from normal signals.
My issue with systemverilog is the multitude of implementation with widely varying degrees of support and little open source. Xsim poorly supports more advanced constructs and crashes with them, leaving you to figure out which part causes issues. Vivado only supports a subset. Toolchains for smaller FPGAs (lattice, chinese, ...) are much worse. The older Modelsim versions I used were also not great. You really have to figure out the basic common subset of all the tools and for synthesis, that basically leaves interfaces and logic . Interfaces are better than verilog, but much worse than equivalents in these neo-HDLs(?).
While tracing back compiled verilog is annoying, you are also only using one implementation of the HDL, without needing to battle multiple buggy, poorly documented implementation. There is only one, usually less buggy, poorly documented implementation.
2 replies →
SpinalHDL's multiple clock domain support via lexical scoping is excellent.
Save for things like SV interfaces (which are equivalently implemented in a far better way using Scala's type system), SpinalHDL can emit pretty much any Verilog you can imagine.
[dead]
[dead]
[dead]
[dead]
Tangentially related, I've experimented with Tailscale and Zerotier and, tho I guess they have different audiences, I prefer Zerotier for reliability. Tailscale gets borked by existing VPN config, breaking things on local networks. I like both but does anyone care to share their experiences or explain more in depth the uses / differences as they see it?