Comment by wodenokoto

2 days ago

The point about a datalake is to separate computer and storage. Postgres isn’t a compute layer it’s an access layer.

Your compute asks Postgres “what is the current data for these keys?” Or “what was the current data as of two weeks ago for these keys?” And your compute will then download and aggregate your analytics query directly from the parquet files.

2 comments

wodenokoto

enether 1 hour ago

but most serious compute engines already speak Iceberg, what do they gain from interfacing with PG now?

My understanding is the opposite - PG cuts it as a compute layer for small amounts of data, and this is where it excels.

I also assume `pg_lake` was built mainly with the intention of creating/writing tables, and the ability to read comes "for free" as an extra, since Iceberg integration is already written.

fifilura 21 hours ago

Sounds more like you need postgres as a backend than vice versa.