Comment by ndneighbor
1 day ago
Yea, I mean, that's the whole MO of our platform and we failed at that. So yea, that's disappointing and more so for our customers.
I can provide an explanation about the GCP dependency. Yes, we have host workloads off GCP, and we have been able to build a good business by performing a cloud exit. However, we were worried that we would have a circular dependency on our own cloud. I don't think we expected to get auto-modded out of our own account, hence we left our DB on CloudSQL.
It was never our intent to deceive people that we didn't own our own destiny with our business. The last GCP issue, we were assured that this scenario wouldn't happen (when we got auto-ratelimited, which was bad, but survivable) - but it seems like we have further work to do. Apologies.
Why CloudSQL? why not AlloyDB for stability?
I’m very sympathetic and understand that decisions are easy to criticize in hindsight but leaving your database in GCP while moving everything else to your own data centres seems so backwards I can’t even begin to imagine how that could happen. Was this really an intentional design decision?
I have exactly the same architecture. You can easily administer a postgres/mysql on your own infrastructure, but it's also the one thing where backups and availability are super strict. I can easily support multi-region in Google Cloud or AWS and that's way harder to do on-prem, and it's also hard to handle the replication story as safely as with Google Cloud. The hope is that GCP et al. give you safety and availability for the control plane stuff and you can run your data plane on-prem.
At $2m/mo spend, this kind of thing is insane. GCP has never been the most reliable of clouds but this is pretty awful. I would never have expected this.
I have kind of the same architecture. I host multiple dedicated servers and vps instances in the Hetzner "cloud", but all of these connect to a few hosted databases by Hetzners web hosting packages for like 20 bucks a month. It sounds insane, but the one thing that absolutely needs to stay online, is the database, so not hosting this myself makes sense. And since Hetzner is apparently tuned their dirt cheap databases pretty well, we can hammer them pretty hard without any problems.
> decisions are easy to criticize in hindsight
I mean, the pain we have caused our customer ultimately proves you correct. That said, we made our decisions with the information and constraints that we knew in that moment in time. Railway has hosts in AWS/GCP/and co-los, so coordinating those workloads in a fully distributed manner would be ideal but end of the day, we didn't forsee that would just have our project get deleted just like that.
(Even if we did get assurances from them in 2024, that it wouldn't happen again, although we just got auto-rate limited the last time.)
Thanks for getting things back up (genuinely mean that, btw). Upon logging back in I was prompted to promise I'm not deploying naughty things (I'm not). Was this in response to GCP detecting illegal (prohibited) behavior from something deployed via railway?
1 reply →
could you clarify, did an automated process by Google delete a GCP project/account/resource(s)? like, what exactly were you seeing when trying to get access or see what happened?
1 reply →
this is easily explained by "database migrations are incredibly difficult and very risky"