Comment by buremba

2 days ago

All you need is Postgres until you scale into TBs of data. We use Postgresql as a durable workflow engine, vector search, time-series data, BM25 search, OLTP/OLAP engine, and a queue. It's basically the only dependency we have for https://lobu.ai

The main benefit is centralizing all the data in one place so we don't need to worry about copying data in between multiple systems. Once something becomes the bottleneck, you can eventually migrate to a purpose specific tool to scale out.To be honest, LISTEN/NOTIFY in my opinion is the most fragile part of PG but it's fine as start until you scale out.

28 comments

buremba

tibbon 2 days ago

But when you hit that wall, it is hard to stop and convince people to use different patterns and systems. I've seen so many tables go from "it will only be a few thousand rows" to suddenly several TB and then people are looking confused when performance and db admin tasks get really difficult.

I'm working at a scale where almost every day I have to ask people "are you use you need to treat that as relational data? It doesn't seem relational"

alexwennerberg 2 days ago
> But when you hit that wall, it is hard to stop and convince people to use different patterns and systems. I've seen so many tables go from "it will only be a few thousand rows" to suddenly several TB and then people are looking confused when performance and db admin tasks get really difficult.
It's much, much worse in my experience to have to develop for the opposite -- working on a system that was designed for an imagined "infinite" scale that in reality like 100GB and a few transactions a minute.
- icedchai 2 days ago
  
  [dead]
- teaearlgraycold 2 days ago
  
  [dead]
dieselgate 2 days ago
> are you use you need to treat that as relational data?
Is this intended to be "you sure you need..."?
- turkeyboi 2 days ago
  
  Obviously, yes

sroussey 2 days ago

Use different “databases” besides public at the very start. No joins between them. You will be in a good position to just split the postgres instance by those at a later date. They will have different usage patterns than the merged version you have now, and will be easier to optimize and will buy you some time. And time is all you need.

gjvc 2 days ago

"public" is not a database, it is a schema within a database.
apropos bad naming, postgresql authors are not forgiven for naming all the databases on a single host a "cluster". I mean __really__.

hmaxdml 2 days ago

Listen/notify is poised to become much better in PG 18 and 19

stuartaxelowen 2 days ago
Why’s that?
- TkTech 2 days ago
  
  In pg19 https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit... will land, which significantly improves NOTIFY performance. Right now LISTEN/NOTIFY doesn't scale to very busy instances because a `NOTIFY` within a transaction takes a global lock.
  
  2 replies →

ceres 2 days ago

Just an fyi, when I try to sign in with google for your app I get the message: "The app is requesting access to sensitive info in your Google Account. Until the developer (*reka*kc*@gmail.com) verifies this app with Google, you shouldn't use it."

buremba 2 days ago

Ahh, sorry about that. It should be fixed in an hour, looks like we mixed the permissions. I just tried and confirmed other login methods work if you would like to try out.

throwaway7783 2 days ago

I'm in the same camp. Do you use any specific extensions? Especially for OLAP and time series (partitioned tables + related extensions work fine, but curious if you use anything else)

buremba 2 days ago
The native extensions are fine but I don't have good experience with any third party extensions, so far tried Timescale, pg_lake, citus, and pgvectorscale. They look very appealing but it's usually a trap as you can't get the value without using the vendor's cloud offerings.
I think if you grow enough to look for these extensions, it's usually better to bet on purpose-specific tooling. For example, I use DuckDB/Iceberg combination extensively for columnar data and connect DuckDB to PG when I need it.
- throwaway7783 2 days ago
  
  Fair enough. How do you do BM25?
osigurdson 2 days ago
From experience, I'd suggest using ClickHouse beyond a few billion rows of timeseries data in Postgres.
- throwaway7783 2 days ago
  
  Nice thing about our use case is that its not strictly analytics, but looking at most recent raw data. ClickHouse is definitely the powerhouse for analytics
  
  1 reply →

cultofmetatron 2 days ago

conversely, startups that start scaling for tbs of data never make it to needing tbs of data. They burn too much energy scaling when they don't yet have a product people want yet.

nicoburns 2 days ago

Yep. I've also seen systems that were slow with <10GB data of because of bad application of patterns that were supposedly "scalable" (pulling entire tables out of the database to implement joins in application code because "nosql is faster" is not actually fast).

pphysch 2 days ago

I don't see logs mentioned. I agree with most those applications but would keep my OLAP stuff (metrics, logs, traces) in a separate store like VictoriaMetrics, both for capacity and read activity.

TkTech 2 days ago

pg_timescale can take you pretty far for metrics and would be Good Enough for almost all users. Totally agree on raw, high-volume logs though.
buremba 2 days ago
Yeah I have logs in Sentry, which also uses Postgresql.
- valyala 2 days ago
  
  Sentry stores logs in ClickHouse - https://blog.sentry.io/how-sentry-queries-unstructured-data-...
  
  1 reply →