Comment by getnormality
1 day ago
I skimmed the article for an explanation of why this is needed, what problem it solves, and didn't find one I could follow. Is the point that we want to be able to ask for visualizations directly against tables in remote SQL databases, instead of having to first pull the data into R data frames so we can run ggplot on it? But why create a new SQL-like language? We already have a package, dbplyr, that translates between R and SQL. Wouldn't it be more direct to extend ggplot to support dbplyr tbl objects, and have ggplot generate the SQL?
Or is the idea that SQL is such a great language to write in that a lot of people will be thrilled to do their ggplots in this SQL-like language?
EDIT: OK, after looking at almost all of the documentation, I think I've finally figured it out. It's a standalone visualization app with a SQL-like API that currently has backends for DuckDB and SQLite and renders plots with Vegalite. They plan to support more backends and renderers in the future. As a commenter below said, it's supposed to help SQL specialists who don't know Python or R make visualizations.
I was quite psyched when I read this so maybe I can tell you why it's interesting to me, although I agree the announcement could have done a better job at it.
In my experience, the only thing data fields share is SQL (analysts, scientists and engineers). As you said, you could do the same in R, but your project may not be written in R, or Python, but it likely uses an SQL database and some engine to access the data.
Also I've been using marimo notebooks a lot of analysis where it's so easy to write SQL cells using the background duckdb that plotting directly from SQL would be great.
And finally, I have found python APIs for plotting to be really difficult to remember/get used to. The amount of boilerplate for a simple scatterplot in matplotlib is ridiculous, even with a LLM. So a unified grammar within the unified query language would be pretty cool.
I share your pain. You might enjoy Plotnine for python, helps ease the pain. The only bad thing about ggplot is that once you learn it you start to hate every other plotting system. Iteration is so fast, and it is so easy to go from scrappy EDA plot to publication-quality plotting, it just blows everything else out of the water.
But isn't this then just another tool that you're including in your project? I don't get why I would want to add this as a visualization tool to a project, if it's already using R, or Python, etc...
I mean, is it to avoid loading the full data into a dataframe/table in memory?
I just don't see what the pain point this solves is. ggplot solves quite a lot of this already, so I don't doubt that the authors know the domain well. I just don't see the why this.
Well there's always going to be a dependency anyway: loading the data, making it a dataframe, visualizing it, this might be 3 libraries already.
In a sense I really get your complaint. It's the xkcd standard thing all over, we now have a new competing standard.
I think for me it's not so much the ggplot connection, or the fact that I won't need a dataframe library.
It's that this might be the first piece of a standard way of plotting: no matter which backend (matplotlib, vega, ggplot), no matter how you are getting your data (dataframes, database), where you're doing this (Jupyter or marimo notebook, python script, R, heck lokkerstudio?). You could have just one way of defining a plot. That's something I've genuinely dreamt about.
And what makes this different from yet another library api to me is that it's integrated within SQL. SQL has already won the query standardisation battle, so this is a very promising idea for the visualization standardisation.
1 reply →
Anything to standardise some of the horrifying crap that data scientists write to visualise something.
[dead]
This isn't about ggplot (or any particular library) per se, it's about using a flavour of SQL with a grammar of graphics: https://en.wikipedia.org/wiki/Wilkinson%27s_Grammar_of_Graph...
What makes it interesting is the interface (SQL) coupled with the formalism (GoG). The actual visualization or runtime is an implementation detail (albeit an important one).
It seems to be for sql users who don’t know python or r.
I would even add that it fits into a more general trend where operations are done within SQL instead of in a script/program which would use SQL to load data. Examples of this are duckdb in general, and BigQuery with all its LLM or ML functions.
There’s certainly some benefit in a declarative language for creating charts from SQL. Obviously this doesn’t do anything that you can’t also do easily in R or Python / matplotlib using about the same number of lines of code. But safely sandboxing those against malicious input is difficult. Whereas with a declarative language like this you could host something where an untrusted user enters the ggsql and you give them the chart.
So it’s something. But for most uses just prompting your favorite LLM to generate the matplotlib code is much easier.
[flagged]