← Back to context

Comment by giancarlostoro

20 hours ago

Reminds me when the ELK stack was called just ELK (idek what it is now) we had a server we put it on, and after making the additional dashboards my manager wanted, we learned the limits of ES / ELK. It needs a ridiculous amount of memory, because it will shove everything in memory. Same thing when I learned that MongoDB indexing puts every item in memory as well, which is a yikes, why would you not want to index?

I bet there's money to be made for building a drop-in to either of those two that requires less memory, would save companies a bundle, and make other companies a bundle as well.

There's no high performance database that wont take all of your memory (at least for size of data) if you let it.

That's because it's much, MUCH faster to do it that way, though if you can deal with certain type of latency trade offs for throughput something like turbopuffer can do wonders for your costs.

  • MySQL doesnt eat up all 8GB of my system when I need to query a table with indexed values, MongoDB seems to eat it all up.

    • If the data is < ram size and if you read that data again and its off disk again its the slowest it can possibly be, there's a reason most databases implement a buffer cache (actually making writes insanely faster as well) but yeah, MySQL is generally not a very good operational database with all the ones I have tinkered with.

Production grade multi tenant databases want to *solely* run on RAM.

> why would you not want to index?

Because if you don't need an index it wastes RAM, as you've learned. Maintaining indices also has a cost. Index only what you need.

In the sense of the blog post: A senior with decent DB experience would have told you. ;)

  • You mean NoSQL which is slightly different and nuanced, in a shop that was mostly SQL with the exception of me, the one Junior developer using MongoDB and Elastic, mind you, we got a lot of things done and I learned a lot more about Mongo than I would like.

    In all fairness this was my first job a few years ago as a developer, I deep dove MongoDB but I was also one of the only devs using it at this place.

    My previous experience with MongoDB had been in college and more limited.

  • Everything "wants to" run solely in RAM, but we don't have infinite RAM, so a "production grade" database should also be able to fetch data from disk unless this is an explicit tradeoff. MariaDB and PostgreSQL do not require all indices to be stored in RAM. Obviously they can be accessed more quickly if they are in RAM but they are designed under the assumption they will often be stored on disk. It sounds like MongoDB is not, and given the reputation of MongoDB, this is as likely to be incompetence as it is to be a willing tradeoff.

    • Every serious database that is designed to handle moderate to high traffic, will expect you to have RAM to fit all data and indices. Relational DBs do a solid job if that's not the case, but that also sabotages the efficiency you could get from them. It will work for some time. If it's enough for your, that's fine.

      I am not experienced with MongoDB, I don't know if previous comment reports were the users fault or MongoDB's. But one thing is clear to me, complaining it uses too much RAM and not knowing the reasons for it, is a user problem. A common mistake is to setup a DB and expect it just magically does works. DBs are complicated beasts, you have to know how to deal with them.

      3 replies →

For anything Lucene-based (Elasticsearch, Solr) this was a problem where some of the indexed data had to be transformed for another type of query to be efficient, and it put the transformed data into the Java heap then never released it. I think it was indexed data for searching was read straight from disk and was fine, but analysis queries needed the transformed version?

At some point they added the docValues configuration option per-field to do the transformation during indexing and store it to disk instead, so none of it had to be stored in the heap. Instead what you're supposed to do is rely on the OS disk cache, which handles eviction automatically, so you can run with significantly less memory but get performance improvements by adding memory without having to change any configuration further.