Comment by hintymad

4 years ago

It sounds trivial to "wrap open source software", but surprisingly it is big value-add to thousands of companies. We can't just look at successful companies like Netflix to downplay the challenges of operating a service. Not every company knows how to operate complex systems under manageable cost. How many companies can really manage a Kafka cluster, let alone scaling it, for instance? Indeed, even companies that people deem powerful may screw up, if they don't get their culture or process right. Take Uber for example, for god damn five years, they still couldn't offer a service like EC2, let alone supporting persistent volumes. They still couldn't make their database provisioning on demand via an API. Their MySQL-based NoSQL solution was still based on FriendFeed's architecture and the APIs were hard to use. Yet they spent millions building a k8s replacement, building a GPU database, switching from mysql to postgres and back to mysql, etc and etc. So, yes, cloud companies like AWS buildd mere control planes to wrap open-source software, yet such seemingly mundane offering does bring values to many customers.

A key reason for Netflix to have an easy-to-operate infrastructure is that Netflix prioritizes productivity and scalability. They specifically did three things:

1. No fixed deadline, with a few exceptions of course, for platform-related projects.

2. Promotion/salary negotiation was not tied directly to release of external features.

3. A single engineer could be responsible for more than one service for the entire company, with 24x7 oncall.

With Netflix establishing such incentives, engineers naturally focus on getting infrastructure right, to the point that oncall 24x7 is a non-issue.

So, yeah, culture matters, big time.

Edit: another incentive was that a service was measured by its adoption. The more people praised it, the more successful the service would be. Requiring meetings to get buy-in for a new service was considered a sign of potential failure. As a result, every single team focused on making the value proposition of their services obvious. Path of least resistance was a given instead of a debated topic.

This is the most important point, IMO. Amazon's value add is not the software itself, it's the operation of the software. That includes a LOT of stuff, not just making sure it's running. It's security modeling and patching, compliance, DDOS protection, etc. Amazon's product is an army of ops engineers working 24/7 to keep your stuff secure and online.

With that in mind, their behavior here makes a lot more sense, and comparing it with companies who have dramatically different products, like Facebook and Google, takes a lot of effort to understand the differences and what impact they have.

  • You can rent a car but nobody would suggest the car manufacturer be paid nothing for the privilege just cause the rental company cracked car maintenance and how to fill the tank.

    The AWS "value add" is only value add in the context of being locked into AWS in the first place.

    • Elasticsearch is not a car manufacturer. They are an after market manufacturer for a car which they got without paying themselves. Too many people completely ignore that the heart of Elasticsearch is Lucene. Elasticsearch adds things on top of it, but without Lucene it would be useless. Elasticsearch is the same as Amazon here, but mentioning that doesn't make for such a nice narrative.

    • I'm not sure if I like your comparison to be honest.. not only is it not _just_ maintenance and filling the tank, if we stick with your picture, there is also things like buying the car, paying for it when no one uses it, insuring the car, repairing the car, general logistics of moving it around when someone has a one way rental, etc. - basically making it convenient for consumers to rent a car.

      Looking at what "as a service" providers of open source software do though, that is taking it a step further, since they wrap the software in a layer that _smoothes_ out changes for the user. Going back to the rental company that would equate to the car manufacturer deciding the indicator needs to be on the right side of the steering wheel now and the rental company installing an adaptor so that it remains on the left for you, to keep the look and feel for the user the same.

      Not a perfect comparison, but they never are :)

      1 reply →

90% of the companies do not know how to manage software. They got weird dogmas, no KPIs, no ability to measure performance or debug problems. This is why they got external consultants and cloud vendors. What is really funny how they think internally about these issues. If Netflix and Amazon was publishing efficiency numbers and we were able to compare with the bottom 95% of tech users people would be shocked. The difference between the numbers I am aware of (number of computers / engineer for example) is 100x.