← Back to context

Comment by Izkata

14 hours ago

For anything Lucene-based (Elasticsearch, Solr) this was a problem where some of the indexed data had to be transformed for another type of query to be efficient, and it put the transformed data into the Java heap then never released it. I think it was indexed data for searching was read straight from disk and was fine, but analysis queries needed the transformed version?

At some point they added the docValues configuration option per-field to do the transformation during indexing and store it to disk instead, so none of it had to be stored in the heap. Instead what you're supposed to do is rely on the OS disk cache, which handles eviction automatically, so you can run with significantly less memory but get performance improvements by adding memory without having to change any configuration further.