Comment by tkejser
6 days ago
Thank you and yes!
By making the entire architecture of the database visible via system objects - you allow the user to form a mental model of how the database itself works. Instead of it being just a magic box that runs queries - it becomes a fully instrumented data model of itself.
Now, you could say: "The database should just work" and perhaps claim that it is design error when it doesn't. Why do I need instrumentation at this level?
To that I can say: Every database ever made makes query planning mistakes or has places where it misbehaves. That's just the way this field works - because data is fiendishly complicated - particularly at high concurrency of when there is a lot of it. The solution isn't (just) to keep improving and fixing edge cases - it is to make those edge cases easy to detect for all users.
This sounds conceptually similar to performance_schema [1] in MySQL or MariaDB, which is a built-in feature originally introduced in MySQL 5.5 (2010). Or perhaps the easier-to-use sys schema [2], which wraps performance_schema among other things, introduced in MySQL 5.7 (2015).
It's great to have that observability functionality, but I don't really understand the purpose of writing a new DBMS from scratch to add this though. Why not get something merged into Postgres core?
[1] https://dev.mysql.com/doc/refman/8.4/en/performance-schema.h...
[2] https://dev.mysql.com/doc/refman/8.4/en/sys-schema.html
Merging to PostgreSQL core for something that needs to run on top of a Petabytes of data in the cloud, on Iceberg, with an advanced query planner and a high speed SIMD engine.... AND trying to squeeze into the "pg_" naming mess?
I don't think so...
And yes, its conceptually similar to MySQL and also conceptually similar to SQL Servers implementation from 1997. That's by the design. Obviously, we are not writing a new DBMS from scratch just to add system objects.
Have a look at some of the other blogs on that site to see what we are up to. Basically, we want to give you an experience that resemblers that instrumentation you got used to from the on-premise databases, but one that can run on top of Iceberg in the cloud.
> something that needs to run on top of a Petabytes of data in the cloud, on Iceberg, with an advanced query planner and a high speed SIMD engine
Part of my confusion was that this blog post makes no mention whatsoever of any of those things!
It gave me the (incorrect) impression that this observability functionality was the purpose of the product. And it is worded in a way which makes no mention of prior art in built-in DBMS observability.
Looking at the other threads here, I don't think I'm the only one who was confused about that. A couple intro paragraphs to the product might help a lot.
1 reply →