← Back to context

Comment by yencabulator

23 days ago

It's not like every read would make a separate trip all the way to RAM, caches are a thing and SIMD pipelines/parallelizes comparisons within a hash bucket quite well. Lookups from a hash map should amortize to something like 5-20ns per lookup these days. Abseil's Swiss Tables for C++ and Rust's Hashbrown both should reach that.

If you're looking up values from a 3 GB DB, most would have to hit RAM. Lookups form a hash map can be fast, but SQLite does quite a bit more than just a hash map lookup, and it would usually hit RAM, not L3 cache.

  • Parent comment said "with a million or so rows". I looked up numbers for benchmarks with ~1M entries in the hashmap.

    1M 64-bit integers is only 8MB, that's still a small keyspace.

    • Perhaps relevant is that probably only 25% of the IDs that get passed to the select actually have values and this is using one thread for the benchmark. It's too convoluted to share, but indeed when using the in memory database on a lower spec laptop currently I still get up to 20-30m reads per second, pretty close to the 40m on the big box.