Comment by hirako2000
19 hours ago
A sort of survivor bias. A VP ordered to use elastic search, because it worked well at his company before. Turned out it worked well for us. Listen to the VP to make technical decisions. And use elastic search.
19 hours ago
A sort of survivor bias. A VP ordered to use elastic search, because it worked well at his company before. Turned out it worked well for us. Listen to the VP to make technical decisions. And use elastic search.
Reminds me when the ELK stack was called just ELK (idek what it is now) we had a server we put it on, and after making the additional dashboards my manager wanted, we learned the limits of ES / ELK. It needs a ridiculous amount of memory, because it will shove everything in memory. Same thing when I learned that MongoDB indexing puts every item in memory as well, which is a yikes, why would you not want to index?
I bet there's money to be made for building a drop-in to either of those two that requires less memory, would save companies a bundle, and make other companies a bundle as well.
There's no high performance database that wont take all of your memory (at least for size of data) if you let it.
That's because it's much, MUCH faster to do it that way, though if you can deal with certain type of latency trade offs for throughput something like turbopuffer can do wonders for your costs.
MySQL doesnt eat up all 8GB of my system when I need to query a table with indexed values, MongoDB seems to eat it all up.
7 replies →
Production grade multi tenant databases want to *solely* run on RAM.
> why would you not want to index?
Because if you don't need an index it wastes RAM, as you've learned. Maintaining indices also has a cost. Index only what you need.
In the sense of the blog post: A senior with decent DB experience would have told you. ;)
Everything "wants to" run solely in RAM, but we don't have infinite RAM, so a "production grade" database should also be able to fetch data from disk unless this is an explicit tradeoff. MariaDB and PostgreSQL do not require all indices to be stored in RAM. Obviously they can be accessed more quickly if they are in RAM but they are designed under the assumption they will often be stored on disk. It sounds like MongoDB is not, and given the reputation of MongoDB, this is as likely to be incompetence as it is to be a willing tradeoff.
3 replies →
For anything Lucene-based (Elasticsearch, Solr) this was a problem where some of the indexed data had to be transformed for another type of query to be efficient, and it put the transformed data into the Java heap then never released it. I think it was indexed data for searching was read straight from disk and was fine, but analysis queries needed the transformed version?
At some point they added the docValues configuration option per-field to do the transformation during indexing and store it to disk instead, so none of it had to be stored in the heap. Instead what you're supposed to do is rely on the OS disk cache, which handles eviction automatically, so you can run with significantly less memory but get performance improvements by adding memory without having to change any configuration further.
Pick the right use case. It is super awkward, horrible UI for things like log analysis. Use Scalyr instead.