← Back to context

Comment by crazygringo

3 months ago

Yes, I'm talking about end user queries. Not reports that take 2 hours to run.

But even with BigQuery, you've still got to worry about partioning and clustering, and yes they've even added indexes now.

The only time you really just get to think in sets, is when performance doesn't matter at all and you don't mind if your query takes hours. Which maybe is your case.

But also -- the issue isn't generally CPU, but rather communications/bandwidth. If you're joining 10 million rows to 10 million rows, the two biggest things that matter are whether those 10 million rows are on the same machine, and whether you're joining on an index. The problem isn't CPU-bound, and more CPU's isn't going to help much.

Of course there are optimizations to be made, such as not joining on the raw data or saving the order by to last. And avoid outer joins between two large sized partitioned tables.

But to me those optimizations are not imperative in nature.

(And BQ will probable eat the 10 million to 10 million join for breakfast...)