Comment by cryptonector

9 months ago

> Feature Requests

> Concerning schema changes: they desire PostgreSQL to record a history of schema change events, such as adding or removing columns and other DDL operations.

You can do this right now today by using `EVENT TRIGGER`s. You can check out things like Aquameta[0] (if I remember correctly) to see how it's done.

[0] https://github.com/aquametalabs/aquameta

14 comments

cryptonector

weitendorf 9 months ago

We are building similar functionality for our own postgres setup right now.

Of course, postgres is very powerful and you can implement anything like this in many different ways. But at the same time, maintaining DDL history and tracking major changes to the database is a very common requirement, and unfortunately many people don't realize that until they learned that lesson the hard way.

Relatedly are not DDL changes per-se, but big/important db operations that you want to also keep a record of so that you can look back and understand why something changed. I am not sure if this is the right term, but basically when we update our pricing model or skus, or set custom pricing for someone, we want those updates to be "auditable".

Actually, I think this is a relatively common use case too: a fully relational model often leaves you with a large number of "static" tables that only change when you're making updates to your application. They support the smaller number of big, important, dynamic tables that your application is regularly modifying. If you had the foresight to recognize that you'd likely need to change those static tables in the future, you probably organized them so you could do so by adding new rows. It is not quite a DDL change but it is a big, risky change to your application logic that only happens rarely, and you basically just want to keep a list of all those big changes in case things get messed up or you find yourself unable to make sense of older data.

cryptonector 9 months ago
There's lots of extensions that automatically create audit tables and record history in them. The trick is to a) create audit tables for all _existing_ tables, b) create an event trigger so you can have audit tables created automatically for any future new tables. Here's an example of how to do it: https://github.com/twosigma/postgresql-contrib/blob/master/a...
Another thing you might do is to go with a schema that follows the event shipping pattern. In this pattern you have the "truth" held in insert-only tables (deletes and updates not allowed), then turn those "event" tables into ones you can query naturally using VIEWs, MATERIALIZED VIEWs, or live tables that you update with triggers on the event tables. Then your event tables _are your history/audit_ tables.
- weitendorf 9 months ago
  
  Yeah, I guess what I mean is, it's good for very common use cases to have fully supported features in the product itself even if there are third party tools to handle it for you or I already know how to implement it myself.
  I have been working on database features and functionality that I felt I moderately to fully understood and just needed to sit down and implement in the next week, for close to 3 weeks now.
  > In this pattern you have the "truth" held in insert-only tables (deletes and updates not allowed), then turn those "event" tables into ones you can query naturally using VIEWs, MATERIALIZED VIEWs, or live tables that you update with triggers on the event tables.
  This is almost exactly what I'm doing, with an additional versioning column (primary key on (id, version_num)). After I did that I realized that it'd be better to correlate changes with a push_id too because if some operations didn't modify all ids then I wouldn't be able to easily tell which versions were updated at the same time. But then I realized most of my "auditable pushes" would be operations on 3-4 related tables and not just an individual table, so push_ids would be performed on all tables. And also, since not every push modifies every value, it makes sense to model pushes as additions + diffs to the existing table. But then after several pushes constructing the MATERIALIZED VIEW of active values becomes rather complex because I have to convert a sparse tree of diffs across multiple tables into a flat table recursively...
  So yeah it would be pretty nice for postgres to have something that mostly just works to audit changes at either the user, function, or table level.
  
  1 reply →

aftbit 9 months ago

    CREATE TABLE public.audit (
        id uuid NOT NULL,
        created_time timestamp without time zone DEFAULT now() NOT NULL,
        schema_name text NOT NULL,
        table_name text NOT NULL,
        record_id uuid NOT NULL,
        user_name text,
        action text NOT NULL,
        old_data jsonb,
        new_data jsonb
    );

    CREATE OR REPLACE FUNCTION audit_if_modified_func() RETURNS TRIGGER AS $body$
    DECLARE
        v_old_data JSONB;
        v_new_data JSONB;
    BEGIN
        IF (TG_OP = 'UPDATE') THEN
            v_old_data := to_jsonb(OLD.*);
            v_new_data := to_jsonb(NEW.*);
            IF (TG_TABLE_NAME::TEXT = 'users') THEN
                v_old_data = v_old_data - 'last_login_time';
                v_new_data = v_new_data - 'last_login_time';
            END IF;
            IF (v_old_data <> v_new_data) THEN
                INSERT INTO audit (id,record_id,schema_name,table_name,user_name,action,old_data,new_data)
                    VALUES (uuid_generate_v4(), NEW.id, TG_TABLE_SCHEMA::TEXT, TG_TABLE_NAME::TEXT,
                            session_user::TEXT, substring(TG_OP,1,1), v_old_data, v_new_data);
            END IF;
            RETURN NEW;
        ELSIF (TG_OP = 'DELETE') THEN
            v_old_data := to_jsonb(OLD.*);
            INSERT INTO audit (id, record_id,schema_name,table_name,user_name,action,old_data)
            VALUES (uuid_generate_v4(), OLD.id, TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),v_old_data);
            RETURN OLD;
        ELSIF (TG_OP = 'INSERT') THEN
            v_new_data := to_jsonb(NEW.*);
            INSERT INTO audit (id, record_id,schema_name,table_name,user_name,action,new_data)
            VALUES (uuid_generate_v4(), NEW.id, TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),v_new_data);
            RETURN NEW;
        ELSE
            RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - Other action occurred: % at %',TG_OP,now();
            RETURN NULL;
        END IF;
    
    EXCEPTION
        WHEN data_exception THEN
            RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - UDF ERROR [DATA EXCEPTION] - SQLSTATE: % SQLERRM: %',SQLSTATE,SQLERRM;
            RETURN NULL;
        WHEN unique_violation THEN
            RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - UDF ERROR [UNIQUE] - SQLSTATE: % SQLERRM: %',SQLSTATE,SQLERRM;
            RETURN NULL;
        WHEN OTHERS THEN
            RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - UDF ERROR [OTHER] - SQLSTATE: % SQLERRM: %',SQLSTATE,SQLERRM;
            RETURN NULL;
    END;
    $body$
    LANGUAGE plpgsql
    SECURITY DEFINER;
    
    CREATE TRIGGER audit_logger_accounts
        AFTER INSERT OR UPDATE OR DELETE ON accounts
        FOR EACH ROW EXECUTE PROCEDURE audit_if_modified_func();

    CREATE TRIGGER audit_logger_users
        AFTER INSERT OR UPDATE OR DELETE ON users
        FOR EACH ROW EXECUTE PROCEDURE audit_if_modified_func();

weitendorf 9 months ago

It's not that maintaining some kind of audit history is an insurmountable problem, it's that it's both a very common need and the "right" way to do it can take significantly longer to set up than a quick naive approach. Once you take into account batching (many if not all of your pushes to auditable tables that are typically static will happen in batches within that table and across multiple other tables, because you are updating your application logic or implementing some new feature) and the need to not blow up your storage with audit logs by recording diffs, and the need to be able to "replay" or "rollback", etc. it's quite a lot to handle
And again, it's not impossible to do any of that. I just think it seems repeatable enough that it could be a first-party feature, and I would prefer to just enable or configure that feature and move on to other problems.
Going from something like your naive approach which I assume an LLM generated for you, to something productionized enough to bet your business on it, is not always trivial.

1 reply →

tudorg 9 months ago

We use event triggers both in pgroll [0] and pgstream [1] to capture DDL changes. It's a feature of pgroll that it captures the schema migration history, regardless if you do the changes via pgroll or with another tool. And pgstream uses it to put the schema changes in the logical replication stream.

One thing to be aware is that on many Postgres DBaaS EVENT TRIGGERS are not allowed, because they generally require superuser. But RDS and Aurora do support them, we (Xata) support them of course, and I think Supabase is working on adding support for them.

[0]: https://github.com/xataio/pgroll

[1]: https://github.com/xataio/pgstream

erichanson 9 months ago

Thanks for remembering Aquameta and stay tuned! Some cool new stuff coming out shortly.

cryptonector 9 months ago
:)
Hey, I see you've added a web server. How does it compare to PostgREST?
- erichanson 9 months ago
  
  Big difference is it's implemented in pl/pgsql so as Aquameta evolves, there's no external deps. There's a function called endpoint.request(http_verb, URL, post vars) and then it does the rest in plpgsql. A thin Go daemon just takes the request and throws it at the function.
  It does a lot of the same stuff PostgREST does. Automatic REST interface to any database, but it also hosts static resources, and dynamic mapping of URL templates to functions.
  Needs a rewrite though. That's probably the next big dev push. Right now we're rewriting the system catalog (meta) and the data VCS.
  
  3 replies →