← Back to context

Comment by maayank

3 months ago

What kind of issues you had that they could fix them immediately? I assume this is not about configuration issues on your part, but maybe I’m mistaken

I was one of the first users of the AWS Elastic File System because I had an ideal use-case for it exactly when it was first introduced. Everything worked just fine for 30 days, and then the web site basically locked up. It turned out that EFS had an initial "grace period" during which IOPS were unlimited, then it would become proportional to the GB of stored data. We had just a few hundred megabytes, so it worked out to something like 0.4 IOPS. Slower than a floppy drive! Support immediately reset the grace period for us, flipped some internal flag to make it not expire, and then a few months later the product itself was fixed to have a reasonable minimum IOPS per account irrespective of the data volume. At the time there were zero mentions of any of this on Google, I must have been the first few people to hit this issue after general availability.

A direct comparison is a nearly identical issue with Azure SQL Server Managed Instance. It too had IOPS proportional to the size of each file of a database. We migrated a database that used many small partitions (per month I think?), each with its own small file. Its performance was horrendous, easily 100x slower than on-prem. The support team could barely speak English, got repeatedly confused about the product (Azure SQL Database != SQL Managed Instance), couldn't understand the problem, and even insisted that totally broken performance "was a feature" and we should redesign "our database". Sure buddy, I'll go tell the third-party vendor that, meanwhile Microsoft themselves insisted we should migrate all of our legacy databases to this garbage. We did, it took months, cost a ton of money, and now it basically doesn't work! We abandoned the product, as have many other people. At the time, this had been an issue for many years with Microsoft engineering basically whistling at the ceiling as they cheerfully ignored it. More than a year later they fixed it, but you've got to wonder what else is still wrong with it that support can't help with.

There's more examples, but that pair stuck in my mind because they had the same root cause but wildly different outcomes.