Comment by altdataseller

3 years ago

What if your product simply stores a lot of data (ie a search engine) How is that weird?

That's fair - I added "are working on a specific problem which needs a more complicated setup" to my original comment as a nicer way of referring to edge cases like search engines. I still believe that 99% of applications would function perfectly fine with a single primary DB.

Depends what you mean by a database I guess. I take it to mean an RDBMS.

RDBMSs provide guarantees that web searching doesn't need. You can afford to lose a pieces of data, provide not-quite-perfect results for web stuff. It's just wrong for an RDBMS.

  • What if you are using the database as a system of record to index into a real search engine like Elasticsearch? For a product where you have tons of data to search from (ie text from web pages)

    • In regards to Elasticsearch, you basically opt-in to which behavior you want/need. You end up in the same place: potentially losing some data points or introducing some "fuzziness" to the results in exchange for speed. When you ask Elasticsearch to behave in a guaranteed atomic manner across all records, performing locks on data, you end up with similar constraints as in a RDBMS.

      Elasticsearch is for search.

      If you're asking about "what if you use an RDBMS as a pointer to Elasticsearch" then I guess I would ask: why would you do this? Elasticsearch can be used as a system of record. You could use an RDBMS over top of Elasticsearch without configuring Elasticsearch as a system of record, but then you would be lying when you refer to your RDBMS as a "system of record." It's not a "system of record" for your actual data, just a record of where pointers to actual data were at one point in time.

      I feel like I must be missing what you're suggesting here.

      2 replies →

This is not typically going to be stored in an ACID-compliant RDBMS, which is where the most common scaling problem occurs. Search engines, document stores, adtech, eventing, etc. are likely going to have a different storage mechanism where consistency isn't as important.

a search engine won't need joins, but other things (ie text indexing) that can be split in a relatively easier way.