Comment by Nacraile

9 years ago

> Secretly DynamoDB is just a bunch of SQL databases with floating masters, or so we surmise. If you iterate across things in native order without randomization and at a very high speed then you will overload individual shards. You can end up at 10X write provisioning and still get rate limit responses. Randomizing the traversal of the keyspace fixes this.

It's not based on SQL, but the fact that the table is sharded and has the throughput characteristics you describe is well documented (i.e. not at all secret :p) http://docs.aws.amazon.com/amazondynamodb/latest/developergu...

Yes, it's documented that, to quote the page you linked, "a single partition can support a maximum of 3,000 read capacity units or 1,000 write capacity units."

The documentation also explains that scanning a table runs the risk of saturating the capacity of a partition (1):

> As a table or index grows, the Scan operation slows. The Scan operation examines every item for the requested values, and can use up the provisioned throughput for a large table or index in a single operation. (...)

> The larger the table or index being scanned, the more time the Scan will take to complete. In addition, a sequential Scan might not always be able to fully utilize the provisioned read throughput capacity: Even though DynamoDB distributes a large table's data across multiple physical partitions, a Scan operation can only read one partition at a time. For this reason, the throughput of a Scan is constrained by the maximum throughput of a single partition. (emphasis mine)

Whenever I take an important dependency on a product, I make it a habit to read or skim virtually all of the product's documentation from beginning to end. Documentation for complex technologies is something to study. It's served me very well and I'd recommend the practice to others. With this approach you'll find that you just "know" (or can quickly look up) things that tend to surprise other people. Even if not all of the knowledge is in your working memory, you'll have a vague recollection of reading "something about that" and will be able to come back quickly to what you read.

(1) http://docs.aws.amazon.com/amazondynamodb/latest/developergu...

  • That's fantastic, but let's just take a look at the documentation a few years ago when I discovered this:

    https://web.archive.org/web/20130102210613/http://docs.aws.a...

    Huh, worded slightly differently, isn't it? They DO allude to this perhaps being the case in https://web.archive.org/web/20121221003912/http://docs.aws.a... but it's not made clear at any point that scans will yield up keys neatly bundled by shard.

    It is not the case even now that the scan operation is using your capacity to cause this, it's because of the way shards enumerate their keys and that is not done simultaneously and mixed for you before sending it. You can redirect write traffic to another table and still often exhibit a rate limit effect even though the scan isn't consuming writes for that table.

    That, I think, is still quite surprising.

    You can take for granted how great the docs are now, although I still submit that this aspect of the system is quite poorly documented. AWS in general is fantastic at conveying API endpoints and very poor at offering a new developer a narrative on how to use the product.

    The reason that the docs are as good as they are now: people like me have been around yelling at Amazon for years to improve their documentation, and telling tech reps to better document things. I hope you in your capacity do the same. And I will continue to offer insights like this on forums like this precisely because there are lots of relatively new platform engineers here. It's one of the few things I _like_ doing on hacker news.

  • Pyxl101, I find it surprising that your comment was downvoted. Typically people downvote for one of 2 reasons, either they disagree with your tone or they disagree with your facts, but in your case your tone seems reasonable and your facts seem accurate. It does seem unreasonable that people downvoted you.

    • Is "rtfm" really such a valuable sentiment in reply to my post?

      Personally, I find it to often be a complete non-answer wrapped up with a dismissive and insulting attitude.

      2 replies →