← Back to context

Comment by Yokohiii

10 hours ago

Every serious database that is designed to handle moderate to high traffic, will expect you to have RAM to fit all data and indices. Relational DBs do a solid job if that's not the case, but that also sabotages the efficiency you could get from them. It will work for some time. If it's enough for your, that's fine.

I am not experienced with MongoDB, I don't know if previous comment reports were the users fault or MongoDB's. But one thing is clear to me, complaining it uses too much RAM and not knowing the reasons for it, is a user problem. A common mistake is to setup a DB and expect it just magically does works. DBs are complicated beasts, you have to know how to deal with them.

You certainly don't need to hold all data in RAM to serve "moderate" traffic. A modern hard drive can seek about 80 times per second, an optimized RAID array even more, and an SSD tens of thousands, and if we're pessimistic, it takes 10 seeks to service a request. To me a light load means up to about a request every second, a moderate load means maybe 20 requests per second and a heavy load means hundreds or thousands of requests per second. Pessimistically each (read) request takes 5-10 random reads to service and almost every system is read-mostly.

I think these are realistic expectations for most apps. Obviously the likes of Netflix and Uber get orders of magnitude more, but 99.9% of apps aren't a Netflix or an Uber, and you don't have to optimize for scaling until your app is on a trajectory to become one, and putting your database on an SSD already let's you handle several thousand concurrent users with ease.

  • RDBMS are typically pretty good keeping the frequently requested data in RAM. This disguises the latency of disk access and performance will heavily depend on access patterns. If you serve 1TB of data from a DB with 8GB of RAM and that is sufficient for your use cases, I wont stop you. If you expect low, predictable latency (<1ms) even on a 98/2 r/w system, then it it's not worth the headache.

    Of course everything depends on use case and constraints. I highlight the extremes here, the initial confusion was why DBs require so much RAM. Traditional DBs are optimized around RAM, that's where they perform best. You can abuse that, but it's not the best they can be in terms of latency, predictability and stability.