Show HN: Hydra - Open-Source Columnar Postgres
2 years ago (hydra.so)
hi hn, hydra ceo here
hydra is an open-source extension that adds columnar tables to Postgres for efficient analytical reporting. With Hydra, you can analyze billions of rows instantly without changing code.
demo video (5 min): https://youtu.be/1yzxgb0Oyrw github repo: https://github.com/hydradatabase/hydra
For 1.0 GA release, aggregate queries are over *60% faster* than Hydra beta due to aggregate vectorization. Spatial indexes (gin, gist, spgist, and rum indexes) and pg_hint_plan are now enabled for performance optimization.
postgres is great, but aggregates can take minutes to hours to return results on large data sets. long-running analytical queries hog database resources and degrade performance. use hydra to run much faster analytics on postgres without changing code.
for testing, try the hydra free tier to create a column postgres instance on the cloud. https://dashboard.hydra.so/signup
Congrats on the 1.0 Release, big milestone.
I'm personally really excited about all of the recent tooling for postgres aggregates. Definitely a pain point for a lot of developers and its easy to fall in trap where things work fine in the beginning and then query times explode as requirements change and the dataset grows. Nice to not have to spin up another DB in order to solve the problem as well.
> I'm personally really excited about all of the recent tooling for postgres aggregates. Definitely a pain point for a lot of developers
Could you give a few examples of what you are speaking of?
What's the workflow for leveraging this extension in real-time for an existing database?
Say I wanted to use this to create a high performance "aggregation" API of my existing "write heavy" tables.
Is there a way to keep a `heap` & `columnar` table in sync?
(relative Postgres noob here)
You could use ETL tools like peerdb.io. This isn't "real time" but instead some refresh interval. ZomboDB uses ElasticSearch as an index that is transactionally consistent with Postgres. It gives hope that in the future we will see consistent columnar tables or indexes. SQL Server supports columnar indexes on non-columnar tables.
there are a couple of ways to do it, and none of them that I'm able to think of are great - maybe some others will be able to answer better than I am, but ...
if the data is append-only, an insert trigger could work. if it gets updated and deleted, then insert, update, and delete triggers could be added. of course if the table is very active, this could get bad, fast.
alternately, you can do an insert every hour or so, like insert into table_columnar where created_at > DATE_TRUNC('hour', created_at)
or, even truncate the columnar table daily and re-insert all of the data.
likely none of these is the _best_ solution, but they could help you find what might be the best solution for you.
alternately, if the query patterns work well, you can simply convert the table to columnar, but that's not a panacea.
I've been using Hydra for the last ~2 months & genuinely love it. The team is really talented & it's so great to see the progress they've been making. Congrats on the 1.0 GA release! Huge step!
Thank you!
Nice tool, only unfortunate name, consider changing it. Already very well know security tool named hydra https://github.com/vanhauser-thc/thc-hydra been around since 2001. Then facebook went ahead and named their config tool hydra https://github.com/facebookresearch/hydra on top of it. Like we get it, hydra popular mythology but we could use more original naming for tools
yea acropolis would be a better name given that its columns are famous
I was thinking X.com - is it available?
1 reply →
Let's hope Ory never uses it! Oh no[0].
[0] https://www.ory.sh/hydra
Yeah, everyone names their thing something generic like Atlas or Hydra. Choose to be daring and original instead! You won't regret it.
Or https://github.com/hydra-synth/hydra (Livecoding networked visuals in the browser, since 2017)
Big congrats on 1.0! Super exciting project.
My dream scenario would be installing hydra as an extension into my main rails application database. My usecase is showing analytics numbers directly to users, like "how many people visited my listing", which regular row-level postgres is not suited to answer. To do this now we need a to get that data from our DW, which is slow for single queries, so we need a cache, which we need to keep in sync, which is complexity I don't want. It would be amazing if I could do user-facing analytics queries directly in my main app db.
What put me off after a quick scroll:
Installing the extension changes the default table type to be columnar. I don't want an installed extension to do that, my main workload is still row oriented oltp, I only want specific tables to be columnar and I don't want to change all my normal migrations to specify `USING heap`. IMO timescale does this really well, it's an extension, not a new database. At least that's how I would want it to be.
It also seems like you're trying to claim postgres foreign data wrappers as "hydra external tables", implying it's a new feature? Postgres does this (reading other databases and external files) out of the box and it feels sneaky to try and brand that.
Also the FAQ says "Hydra is not a fork." When the engine clearly is: https://github.com/hydradatabase/citus I realize you want to monetize this as a bigger platform and that's completely fair, but it strikes me as dishonest to deny the citus originins in the FAQ.
Thanks for calling these out, as these are just misunderstandings. We will certainly tweak the language around these.
- Installing the extension itself does not change the default table type, this is only the case on Hydra Cloud and our Docker image.
- "Hydra is not a fork" refers to the fact that Hydra did not fork Postgres; it is an extension. We have put in a lot of effort since forking Citus, but it's not our intent to hide that fact.
- Yes, "Hydra External Tables" is a productization around FDWs, there's more we want to do with it but it hasn't been our focus lately.
> - Installing the extension itself does not change the default table type, this is only the case on Hydra Cloud and our Docker image.
Ah cool, thanks! How would I go about adding the extension to my own "FROM postgres:15" Dockerfile?
> Installing the extension changes the default table type to be columnar.
that is not the case, hydra as a service sets the default table type. the columnar extension does not make any changes like that, it simply ("simply") adds columnar as an option.
I'm just an engineer, so I'll leave the other comments for others :)
Congratulations!
Please also add this info :
#1. to the pgsql-announce list: https://www.postgresql.org/search/?m=1&ln=pgsql-announce&q=h... "Your search for hydra returned no hits."
#2. to the https://planet.postgresql.org/
Watch out. There used to be another Hydra project, a data repository with rich linked metadata, that changed its name after legal threat over trademark from Hydra Corporation. Now it's called Hyku, https://hyku.samvera.org/
I hope you choose to defend your name.
Should have mentioned, if you want to chat about open source, analytics, or meet some of the Hydra team swing by our event in SF this Thursday: https://partiful.com/e/gowvDVdnNcBLKUzfGOPv
some previous discussions:
https://github.com/hydradatabase/hydra#license> since the GitHub sidebar is misleading
Congratulations. @coatue, it would be great if you can share your email to reach out for Licensing details. I did fill up your form in site, but never received any response
Nitin, I would be happy to- would you mind emailing me at J at hydra dot so. Let's chat!
I have 2 questions.
1. Is this optimized for constantly adding and removing rows to the columnar table?
2. Is this supported by Microsoft Azure Flexible Server for Postgres?
> Whenever possible, design data coming into the data warehouse as append-only. Hydra's columnar store only supports inserts. If you need to update or delete data, you will need to use row (heap) tables.
that is not the case at this point. updates, deletes, and vacuuming are all available.
2 replies →
I can answer number 1:
updates and deletes are available, as well as the ability to compact the table.
> For 1.0 GA release
You may want to check that box in the README, assuming it is already done.
Great catch - updating, please hold
How does sharding work? Can I use this with citus to scale horizontally?