Comment by whalesalad

4 months ago

This is a really good description of more or less exactly how our current approach works! This is a daily granularity variant we are testing atm, in order to eliminate flip-flops that occur during the length of a business day. The v1 impl was down to the second, this one is daily.

Here is the core of it:

    CREATE TABLE time_travel_daily (
        domain              TEXT        NOT NULL,
        valid_range         tstzrange   NOT NULL,
        valid_from          timestamptz GENERATED ALWAYS AS (lower(valid_range)) STORED,
        valid_to            timestamptz GENERATED ALWAYS AS (upper(valid_range)) STORED,

        tld                     TEXT,
        owned                   BOOLEAN,
        acquired_at             timestamptz,
        released_at             timestamptz,
        registrar               TEXT,
        updated_at              timestamptz,
        accounting_uuid         TEXT,
        offer_received_date     timestamptz,
        payment_received_date   timestamptz,
        sold_at                 timestamptz,
        sold_channel            TEXT,
        last_renewed_at         timestamptz,
        expires_at              timestamptz,
        transfer_started_at     timestamptz,
        transfer_completed_at   timestamptz,
        transfer_eligible_at    timestamptz,

        snapshot_json           JSONB NOT NULL,
        inserted_at             timestamptz DEFAULT NOW() NOT NULL,
        source_data_change_id   INT,

        PRIMARY KEY (domain, valid_range)
    );

    CREATE INDEX ttd_domain_idx ON time_travel_daily(domain);
    CREATE INDEX ttd_gist_valid_range_idx ON time_travel_daily USING gist(valid_range);
    CREATE INDEX ttd_owned_valid_range_idx ON time_travel_daily USING gist(valid_range) WHERE owned = TRUE;
    CREATE INDEX ttd_registrar_idx ON time_travel_daily(registrar) WHERE registrar IS NOT NULL;
    CREATE INDEX ttd_source_data_change_id_idx ON time_travel_daily(source_data_change_id) WHERE source_data_change_id IS NOT NULL;

And then here is a piece of our update trigger which "closes" previous entities and opens an new one:

    UPDATE time_travel_daily
    SET valid_range = tstzrange(lower(valid_range), target_date::timestamptz, '[)')
    WHERE domain IN (
        SELECT DISTINCT dc.domain
        FROM data_changes dc
        WHERE dc.invalidated IS NULL
            AND dc.after IS NOT NULL 
            AND dc.modified_at::date = target_date
    )
    AND upper(valid_range) IS NULL  -- Only close open ranges
    AND lower(valid_range) < target_date::timestamptz;  -- Don't close ranges that started today

A trigger to ‘close’ the old record is a great idea. My stored procedure is also doing some additional validation (validate at every layer = less bugs) so what I’ve got works well enough for me.

But that’s very smart. Never considered it.