Comment by nchagnet
20 hours ago
I was quite psyched when I read this so maybe I can tell you why it's interesting to me, although I agree the announcement could have done a better job at it.
In my experience, the only thing data fields share is SQL (analysts, scientists and engineers). As you said, you could do the same in R, but your project may not be written in R, or Python, but it likely uses an SQL database and some engine to access the data.
Also I've been using marimo notebooks a lot of analysis where it's so easy to write SQL cells using the background duckdb that plotting directly from SQL would be great.
And finally, I have found python APIs for plotting to be really difficult to remember/get used to. The amount of boilerplate for a simple scatterplot in matplotlib is ridiculous, even with a LLM. So a unified grammar within the unified query language would be pretty cool.
I share your pain. You might enjoy Plotnine for python, helps ease the pain. The only bad thing about ggplot is that once you learn it you start to hate every other plotting system. Iteration is so fast, and it is so easy to go from scrappy EDA plot to publication-quality plotting, it just blows everything else out of the water.
But isn't this then just another tool that you're including in your project? I don't get why I would want to add this as a visualization tool to a project, if it's already using R, or Python, etc...
I mean, is it to avoid loading the full data into a dataframe/table in memory?
I just don't see what the pain point this solves is. ggplot solves quite a lot of this already, so I don't doubt that the authors know the domain well. I just don't see the why this.
Well there's always going to be a dependency anyway: loading the data, making it a dataframe, visualizing it, this might be 3 libraries already.
In a sense I really get your complaint. It's the xkcd standard thing all over, we now have a new competing standard.
I think for me it's not so much the ggplot connection, or the fact that I won't need a dataframe library.
It's that this might be the first piece of a standard way of plotting: no matter which backend (matplotlib, vega, ggplot), no matter how you are getting your data (dataframes, database), where you're doing this (Jupyter or marimo notebook, python script, R, heck lokkerstudio?). You could have just one way of defining a plot. That's something I've genuinely dreamt about.
And what makes this different from yet another library api to me is that it's integrated within SQL. SQL has already won the query standardisation battle, so this is a very promising idea for the visualization standardisation.
I see, that's insightful. At first sight I thought of it as a kind of novelty, extending SQL with a visual grammar to integrate with a specific plotting library. But from your comments I can now imagine it has potential as a general solution for that space between data - wherever it comes from, it can typically be queried by SQL - and its visualization.
Thinking further, though, there might be value in extracting the specs of this "grammar of graphics" from SQL syntax and generalized, so other languages can implement the same interface.
Anything to standardise some of the horrifying crap that data scientists write to visualise something.
[dead]