Comment by crazygringo

1 year ago

Performance. A million times performance.

If I need to join a million rows to another table and then calculate an aggregate value, and do it all in a single query, it's fast. It might only take 0.01 seconds.

Whereas if I make separate queries to retrieve a million rows, and a million rows from another table, then it's incredibly slow just due to the data transfer. No matter how conceptually easier another language may be. So it might take 30 seconds for repeated sequential serialization and network and deserialization.

And even if you're looking up individual rows (not a million), with lots of joins that would be lots of round-trips to the database which multiplies latency and load -- so even if you can still get your final result quickly (e.g. 0.01 seconds rather than 0.001 seconds), the load you can handle drops by the same multiple (e.g. you can now only serve 10% as many users).

The general rule of thumb with databases is that they can be super-fast when everything is done on the database machine in a single query and your final result is a small amount of data (assuming everything is indexed properly and the query is written properly). But they become super-slow when you're doing a bunch of queries in a row, and where those intermediate queries can return massive amounts of data, or (even worse) need to send back massive amounts of data.

0 comments

crazygringo

No comments yet

Contribute on Hacker News ↗