Million WebSockets and Go (2017)

6 years ago (gbws.io)

36 comments

riobard

So some time ago when I was playing around with my toy project (RaspChat) I noticed creating 2 channels and a go routine for every incoming websocket connection is not the answer. I was designing RaspChat to work on a 512MB Raspberry Pi; and I was bottle-necked by GC, and memory consumption around 3 - 4K connections. After loads of optimizations I got it around 5K. Digging deeper and found well I have to maintain a pool of go routines (like threadpool) and I have to write event loop. I was instantly pulling my hair. I was sacrificing so much simplicity, and flexibility of Node.js just because I was trying to avoid event loop and wanted to use channels (I did too much Erlang months before starting project and couldn't think anything other than process and messages). I got a backlash on my release (https://github.com/maxpert/raspchat/releases/tag/v1.0.0-alph...) from go community telling me how I was using desierializers/leaving loop holes in file upload and I didn't know shit about language.

At that time I found uws (https://github.com/uNetworking/uWebSockets.js) that easily got me to 10K easily, and I was like "I would rather bet on a community investing on efficient websocket event loop rather than me writing my own sh*t". Don't get me wrong; I love Golang! Seriously I love it so much I have been pushing my company to use Golang. I just don't want to glorify the language for being silver bullet (which it's fanboys usually do). I would never implement complicated business logic that involves many moving pieces. When my business requires dealing with shape of an object and mixing matching things to pass data around; I would rather choose a language that lets me deal with shapes of object. Go has it's specific use-cases and strengths, people advertising it as move it to go and it would be faster than Java/C#/Node.js etc. have not done it or have not dealt with complexity of maintaining it.

T_A_3423 6 years ago

The overhead of goroutines is well known. It's often advertised as only being 4 KB, but as in your case this sometimes is too much.
You got bitten by that but that's not the fault of Go.
The OP was bitten as well and describes a solution in Go. You've solved it by using Node.
Still, your post is quite destructive. Just get over it.
kitd 6 years ago

The way these guys did it is quite interesting:
http://marcio.io/2015/07/handling-1-million-requests-per-min...
I have experimented with something similar, ie a pool of goroutines to which work is dispatched (in my case, invoking anonymous functions passed via the input channels)
hnarn 6 years ago
> When my business requires dealing with shape of an object and mixing matching things to pass data around; I would rather choose a language that lets me deal with shapes of object.
Could you elaborate on this a little?
- nesarkvechnep 6 years ago
  
  Since he mentioned Erlang, I bet he's talking about pattern matching.
  
  1 reply →
d33 6 years ago
Pardon the obligatory throwing in of Rust, but it sounds like you were okay switching languages anyway - have you considered Rust as an option? It doesn't have GC and has a very healthy ecosystem (recently with async primitives officially supported by the syntax). It also has the pattern matching you seem to mean. Perhaps it would help you solve your optimization needs? Otherwise, I'd love to hear why it's not a good use case for it since I'm still exploring the language myself.
- T_A_3423 6 years ago
  
  This kind of promotion creates the tense atmosphere around Rust in the community.
  I wonder if anyone has read the linked article?
  The overhead of goroutines are well known. The article describes the problem and a solution.
  Now someone who got bitten by the overhead of goroutines complains with a (understandable) little bitter tone. He has a good explanation for the issue and why he didn't use Rust but Node.
  Citation:
  >> I started exploring various options ranging from Rust, Elixir, Crystal, and Node.js. Rust was my second choice, but it doesn't have a good, stable, production ready WebSocket server library yet. Crystal was dropped due to conservative nature of Boehm GC, and Elixir also used more memory than I expected. Node.js surprisingly gave me a nice balance of memory usage and speed.
  Then someone didn't seem to have read all the stuff comes around and smartly calls "Use the awesome Rust".
  Even as a Rust user myself I get annoyed.
  
  1 reply →
- maxpert 6 years ago
  
  At that point (3 years back) Rust had no good async IO library. All the recent progress in Rust and Tokio now makes it interesting choice.

andrewmatte 6 years ago

This is still super interesting, two years later but does anyone have an update?

Susheel Aroskar, a Netflix engineer, did a talk about push notifications https://www.infoq.com/presentations/neflix-push-messaging-sc... (2018)

andrewmatte 6 years ago

https://lwn.net/Articles/775238/
Dave Doyle and Dylan O'Mahony did something pretty amazing related too with websockets for Bose.

phoboslab 6 years ago

It's surprising to me that you apparently have to fight for memory usage for these cases when using Go.

A while ago I ran a (quite naively written) nodejs application that maxed out at ~700k WebSocket connections per server - using only 4GB of RAM. Here CPU became the bottleneck.

sansnomme 6 years ago
Go's concurrency design trades off memory usage for productivity; instead of red-blue functions where you have to explicitly design for function interrupt/yield points with the async keyword, you can just write sequential code and the runtime will handle the rest. The downside to this approach is that often the stack will have to be copied during the switching process vs the stackless approach preferred by Node.js, Rust, C# etc.
See the excellent Fibers Under a Magnifying Glass paper by Microsoft Research: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p136...
- Thaxll 6 years ago
  
  This is not how Go "works" overall, you're talking about the size of the goroutine stack which is by default 4KB, so in a scenario with a lot of connections yes it's going to add up if you use 1:1 connection / goroutine, but outisde of that Go uses less memory than Node / C# / Java / Python ect ...
  So I woudn't say "Go trades off memory usage for productivity" since Go is widely used for low memory footprint.
  Same reasons why Go makes sense in services like Kubernetes where each pods are in the range of 2 digits MB, it woudn't be possible whith languages mentioned above.
  Edit: In your edit context it makes more sense :)
  
  7 replies →
- nielsbjerg 6 years ago
  
  Productivity in this case being in the eye of the beholder? I’d argue that people experienced in how node works, wouldn’t have to think too much, since async is the default. I agree with your general sentiment though.
  
  7 replies →
Thaxll 6 years ago

You don't have to fight for memory, it's because of the overhead of Goroutines / default HTTP connections in a scenario with a lot of connections. By default Go uses way less memory than Node.
And in Node the only way to have semi decent performance is to use ultra optimized C/C++ external libraries.
winrid 6 years ago

CPU was probably the bottleneck from GC. You could tune the GC to get better results probably.

jbmsf 6 years ago

There's a brief mention of the load-balancer (nginx) in front of the Go servers; I'm curious if there's anything interesting happening there. I'd imagine that if you lose a server, all of the clients will try to reconnect and traffic will be spread across the existing servers. That's all find and good, but presumably when you bring up a new server to replace the failed on, it'll be seriously underutilized. Is there some easy solution here in nginx-land?

toredash 6 years ago

For websocket? Yes (https://github.com/SocketCluster/loadbalancer), but you would have to introduce another layer (AFAIK) that would detect failure and reconnect to a healthy target without informing the client.

fasteo 6 years ago

For mail.ru, I was expecting [1] you would use tarantool for this task

[1]https://hackernoon.com/tarantool-when-it-takes-500-lines-of-...

maurodelazeri 6 years ago

nothing beats this https://github.com/uNetworking