Comment by bigtones

3 months ago

We use Meilisearch in production with a 7 million article corpus - it works really well.

4 comments

bigtones

My understanding for Meilisearch is that you need enough RAM to keep everything in memory...but you're (probably) not keeping full-text in memory for millions of articles.

Is it just searching metadata, or do you just have a setup that's beefy enough to support that level of memory?

Or am I just wrong? :D

Implicated 3 months ago

Just as a data point...
I'm running a Meilisearch instance on an AX52 @ Hetzner (64GB DDR5 memory / NVMe / Ryzen 7 7700) dedicated to just meilisearch
- 191,698 MB in size - 13 indexes - ~80,000,000 documents
The primarily searched indexes have 5, 6 and 10 million records each. The index with 10 million records has 4 searchable attributes, 10 filterable attributes and 7 sortable attributes.
I don't have any idea what kind of search volume there is - the search form is public, the website it's on is displaying content relative to those 5, 6 and 10 million records (each having their own page) and the AI bots are having a field day crawling the site. I don't cache the search results, nor is cloudflare caching the resulting pages since the site is dynamic and records are _constantly_ being added.
So with all that said - here's the current top output:
top - 06:33:47 up 257 days, 12:10, 1 user, load average: 1.05, 1.18, 1.24 Tasks: 274 total, 1 running, 273 sleeping, 0 stopped, 0 zombie %Cpu(s): 5.8 us, 0.1 sy, 0.0 ni, 93.7 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 63439.1 total, 403.6 free, 16698.8 used, 47065.0 buff/cache MiB Swap: 32751.0 total, 2.2 free, 32748.8 used. 46740.3 avail Mem
2747823 meilise+ 20 0 24.1t 52.0g 36.2g S 94.7 84.0 5w+5d /usr/local/bin/meilisearch --config-file-path /etc/meilisearch.toml
It's bored. Searches are fast. Doesn't need as much memory as the size of the index to be so.
The only hiccups I ran into were back before they introduces the batch indexing. Things were fine in testing/development but when I started _really_ loading the documents on production it was clear it wasn't going to keep up with the indexing - but then it just _never stopped_ indexing, CPU usage very high. I jumped into their discord, connected with someone on the team who I gave access to the server and they made a few adjustments - didn't fix, but helped. Then the next update basically solved the issue with the high CPU use. I still had issues when loading a lot of documents but found a Laravel package for batch indexing Laravel Scout-based indexes and the solved things for me. Then they released the batch indexing, I stopped using the Laravel specific batch indexer and it's been smooth sailing.
I'll be testing/playing with their vector stuff here shortly, have about 10mil of 60mil generated and a new server with a bunch more memory to throw at it.
Would recommend Meilisearch.
bigtones 3 months ago

We needed a 16GB machine to import all the data into Meilisearch, as the batch indexing is quite memory intensive, but once it's all indexed we scaled it back to half that RAM and it works great - very performant.
tpayet 3 months ago

Meilisearch keeps all the data on disk. It uses memory-mapping for optimizing performance, by default everything is safe on disk and the OS cache the most needed pages in memory.
So it works on any machine, really. 2GiB is usually enough for most workloads, but the bigger the dataset, the faster it will be if you give it more memory!