Comment by brabel
21 days ago
> by 2012 we were drowning in tech debt and scaling challenges.
> the greatest engineering team I've ever seen
How do these two things reconcile in your opinion? In my view , doing something quickly is the easy part , good engineering is only needed exactly when you want things to be maintainable and scalable, so the assertions above don’t really make much sense to me.
It is hard to explain the impact of such massive growth over a 2-3 period. New features were coming online while old ones were being abused by overuse. For instance, we launched PostgreSQL in the cloud, something we take for granted today. Not only that, but we offered an insane feature set around "follow" and "forking" that made working with databases seem futuristic.
I remember when we launched that product we went to that year's PGCon and there were people in the crowd angry and dismissive that we would treat data that way. It was actually pretty confrontational. Products like that were being produced while we were also working on migrating away from the initial implementation of the "free tier" (internally called Shen). It took me and a few others months to replace it and ensure we didn't lose data while also making it maintainable. The resulting tool lovingly named "yobuko" ended up remaining for years after that (largely due to the stagnation and turn over).
Anyways, that was just a slice of it. Decisions made today are not always the decisions you wanted to be made tomorrow. Day0 is great, day100 comes with more knowledge and regret. :D
In general, my impression has been that you don't want to architect your solution at first for massive scaling, because:
* You probably aren't going to need it, so putting the effort into scaling means slowing down your delivery of the very features that would make customers want your solution.
* It typically slows down performance of individual features.
* It definitely significant increases the complexity of your solution (and probably the user-facing tooling as well).
* It is difficult to achieve until you have the live traffic to test your approach.
Yeah I think there is a lot of truth here. You can't solve all the problems and in Heroku's case we focused on user experience (internally and externally). Great ideas like "git push heroku main" are game changers, but what happens once that git server is receiving 1000 pushes a minute? Totally different thought process.
Perhaps the thing I would add is that even with the tech debt and scaling problems we still had over a million applications deployed ready for that request to hit them.
An organization without “tech debt” is not good at prioritizing.
Yea it seems like not much thought was put into scalability.