← Back to context

Comment by geophile

4 years ago

Sort of related: thinking in relational algebra, or SQL. It appears to be "natural" to think about computing one atomic value at a time, in loops, or slightly less intuitively, recursive functions. (That latter choice may follow from whether your first language was pure Lisp.)

I was fortunate to have a teacher whose database course drilled relational algebra into us. This was in the 70s, shortly after Codd's paper, and well before SQL was invented, much less established. Now I think about much computation algebraically (and often functionally). But I do see that this is "unnatural" for many, including students, having taught databases for several years.

SQL reflects this. I often see students writing nested subqueries, because that is more procedural, where joins would be a cleaner choice. A colleague of mine wrote a paper many years ago, pointing out that thinking procedurally is more "natural" for many: https://dl.acm.org/doi/10.1145/319628.319656. But thinking set-at-a-time instead of one-at-a-time is a valuable skill, not that far off from thinking functionally.

It's easier not to mess up table based filters using explicit semi-join operators (eg. in, not in, exists) instead of using regular joins because joins can introduce duplicates.

Give me 'any join' operation - ie. just select the first value instead of all, and I'll happily use joins more. They are actually more intuitive.

It's not that relational algebra is untintuitive. It's because standard SQL sucks.

Having taught it, do you have any recommendations for folks who are looking to improve their thinking in relations/sets/SQL skills?

  • I teach, in order, more or less (there is some interleaving):

    - Data modeling - Relational algebra - SQL - DBMS architecture (buffering, btrees, query processing, index selection, ...) - Query optimization - Transactions

    A few assignments have students write an in-memory relational algebra implementation, and then use it to write queries. A typical query is a deeply nested set of function calls (as a "one-liner"). And only then to we get to SQL. The hope is that RA is so ingrained, that the connections to SQL are easier to see. And this background is also really useful in understanding query optimization.

    All of this material, including assignments, is available online (this is my own webserver): http://geophile.com/115.

I think this is related to "wholemeal" programming as some haskellers/FP-ists do, thinking in sets/tree/graphs operations.

> SQL reflects this. I often see students writing nested subqueries, because that is more procedural, where joins would be a cleaner choice.

In my experience, in the non-ad-hoc use-case, views can often be substituted for the procedural approach, forming the equivalent of a POSIX pipe.

*> A colleague of mine wrote a paper many years ago, pointing out that thinking procedurally is more "natural" for many: https://dl.acm.org/doi/10.1145/319628.319656. But thinking set-at-a-time instead of one-at-a-time is a valuable skill, not that far off from thinking functionally.

Hmm. Given the proliferation of tabular data tools (especially spreadsheets) over the intervening 40 years, I wonder if those results would remain the same today (and whether there would be any difference among Excel power users that use pivot tables, etc.)