← Back to context

Comment by SoftTalker

3 months ago

Over my career I've certainly written more SQL than any other type of code. Not so much in the last five years so I'm sure I've lost some of it, but I used to really enjoy it. Once you stop thinking iteratively and start thinking in set operations it becomes quite natural and powerful.

Over the years I have been pushing more and more responsibilities into the RDBMS. I now see things mostly in terms of ETL, SQL and schema. Virtually every conversation I've ever had about the application of technology to the business could be expressed in these terms. Business logic defined as SQL queries can be quite intuitive when the schema is structured well and aligned to the business stakeholders' perspectives.

Code, frameworks, ORMs, "best practices", patterns, et. al. are ultimately a distraction. There are a million ways to get the data in & out of the database. Moving the bits around is not valuable. There are many overblown software solutions out there that could have been a simple merge statement or CSV import job.

I think that a lot of the misconceptions and bad moods about SQL come out of being forced to work with nasty schemas. The language itself is really domain specific. Consider that one wouldn't complain as much about a super fucked up nested query (and resulting SQL syntax woes) if it wasn't necessary to write such a query in the first place. Aligning tuples and relations to the way the business typically talks about them means you will be less likely to be fighting these things over time. Often, it isn't possible to refactor the schema from zero, but you can put replicas/views around a "bad" schema and target it with your new development & refactors.

  • > Consider that one wouldn't complain as much about a super fucked up nested query (and resulting SQL syntax woes) if it wasn't necessary to write such a query in the first place.

    And in "modern" SQL this is solved with CTEs. Use them to unwrap the schema the way you want it first, before implementing the business logic.

    • this is poor man's SQL to unwrap for business logic. SQL is such more more about everything else and so litte about the trouble with mapping business logic into storage.

      6 replies →

I second all of that!

I wish more people would see the beauty. After a session of SQL, when I take a step back and think.

"Hold on. What I have been doing lately is just pure logic. No library dependency resolution, no concurrency problems (even though massive concurrency is certainly under the hood). No mutability issues. Just logic."

SQL obviously has its warts, some of them serious, like testability. But at the end of the day, I wish all programming was like that. Let the computer decide how to do stuff under the hood. And let the human focus on the logic.

I have somewhat half-assed tried to read up on Prolog for trying to take it to the next level, but failed sofar unfortunately. (It was also a goal to try to unlearn some SQL to avoid getting stuck in some local optimum). Maybe somewhere between SQL and Prolog is the future of programming.

  • > Maybe somewhere between SQL and Prolog is the future of programming.

    Must be Datalog then ;)

  • Prolog is very powerful, if you see what professionals can do with it it's eye opening. Unfortunately, it takes a complete relearning of programming to achieve that proficiency level. And after you reach it, you probably cannot use it in your day job...

    • My problem with Prolog is every time I want to start using it I feel like I'm populating a database then doing queries on it. So it feels like SQL with another syntax and less power.

      I'm sure I'm wrong and missing something but that's where I stop.

      2 replies →

  • > Maybe somewhere between SQL and Prolog is the future of programming.

    it was, it most probably is

> Once you stop thinking iteratively and start thinking in set operations it becomes quite natural and powerful.

I dunno... I've written a tremendous amount of SQL, and I still have to think imperatively (iteratively) in order to write queries that are actually performant, and to know which indexes need to exist.

It would be wonderful if I could just think in terms of set operations, but that tends to result in queries that take 5 minutes to execute rather than 5 milliseconds.

My entire thought process is basically -- what table do I start with, what rows in what order, joining to what, under what conditions, aggregating how, rinse and repeat... It's entirely a mental model of loops and aggregation, never of set operations.

  • It may be true, until you do your ETL in an index-less database such as BigQuery or Trino. Postgres will always be faster for optimized, end user serving, queries.

    But BigQuery allows you to scale it to 100s of CPUs without having to worry about indexes.

    • This sounds awful.

      I would do almost any amount of iteration and index tuning to keep the query on a single machine rather than deal with a networked distributed system.

      When you get slow queries the real problem is algorithmic complexity and linear workers only can do so much,

      2 replies →

    • Yes, I'm talking about end user queries. Not reports that take 2 hours to run.

      But even with BigQuery, you've still got to worry about partioning and clustering, and yes they've even added indexes now.

      The only time you really just get to think in sets, is when performance doesn't matter at all and you don't mind if your query takes hours. Which maybe is your case.

      But also -- the issue isn't generally CPU, but rather communications/bandwidth. If you're joining 10 million rows to 10 million rows, the two biggest things that matter are whether those 10 million rows are on the same machine, and whether you're joining on an index. The problem isn't CPU-bound, and more CPU's isn't going to help much.

      1 reply →

Being able to master the theoretical, practical, and skill-based components of designing a good database schema is the absolute truest test of understanding any systems design.

People skip ahead to all kinds of nonsense; but most of software engineering is putting the right data into the right format, and moving it around reliably.

I just did a major refactor of a complex distributed code base. I pretty much only count the work I did on schema re-design as the actual “job”, the rest was many hours of coding, but that’s really just implementation.

There are other ways to define schema than SQL of course, but it’s really the perfect way to learn true systems engineering.

  • Very true. My manager at one of my first jobs liked to say "get the data model right and everything else will be easy" and that has largely been proven true in my experience (and it even applies if you're not using an RDBMS).

Yeah, I also kind of like coding in SQL, with PL/SQL being my favourite extension language, which is kind of heresy in HN, but whatever.