Comment by raw_anon_1111
1 month ago
It seems like you want a time series database not an OLAP. Every problem you described you would also have with Snowflake or another OLAP database
1 month ago
It seems like you want a time series database not an OLAP. Every problem you described you would also have with Snowflake or another OLAP database
Thanks for having this discussion with me. I believe I don't want a time series database. I want to be able to invent new queries and throw them at a schema, or create materialized views to have better queries etc. I just don't find Snowflake or Redshift anywhere close to what they're selling.
I think these systems are optimized for something else, probably organizational scale, predictable low value workloads, large teams that just throw their shit at it and it works on a daily basis, and of course, it costs a lot.
My experience after renting a $1k EC2 instance and slurping all of S3 onto it in a few hours, and Redshift being unable to do the same, made me not consider these systems reliable for anything other than ritualistic performative low value work.
I’ve told you my background. I’m telling you that you are using the wrong tool for the job. It’s not an issue with the database. Even if you did need an OLAP database like Reddhift, you are still treating it like an OLTP database as far as your ETL job. You really need to do some additional research
I do not need JOINs. I do not need single row lookups or updates. I need a compute engine and efficient storage.
I need fast consumers, I need good materialized views.
I am not treating anything like OLTP databases, my opinion on OLTP is even harsher. They can’t even handle the data from S3 without insane amounts of work.
I do not even think in terms of OLTP OLAP or whatever. I am thinking in terms of what queries over what data I want to do and how to do it with the feature set available.
If necessary, I will align all postgresql tables on a timeline of discrete timestamps instead of storing things as intervals, to allow faster sequential processing.
I am saying that these systems as a whole are incapable of many things Ive tried them to do. I have managed to use other systems and did many more valuable things because they are actually capable.
It is laughable that the task of loading data from S3 into whatever schema you want is better done by tech outside of the aws universe.
I can paste this whole conversation into an LLM unprompted and I don’t really see anything I am missing.
The only part I am surely missing are nontechnical considerations, which I do not care about at all outside of business context.
I know things are nuanced and there’s companies with PBs of data doing something with Redshift, but people do random stuff with Oracle as well.
4 replies →