The blog author founded altinity. Altinity's main product offering is a hosted clickhouse service. The top 10 committers to clickhouse all seem to be clickhouse employees. Looking at altinity on github, they contribute much less open source. If clickhouse the company are spending 40%+ of their money to build the product, then others including altinity spend 5% dev and 80% marketing, they will get more customers. That isn't sustainable. How do you solve that? Other than fencing off exclusive enterprise features.
Altinity maintains the ClickHouse operator which got a lot of people to use ClickHouse to begin with. ClickHouse has had a lot of corner cases that were reported from those kinds of people, myself included.
If you look at some of the discussions, while a lot of the fixes come from the clickHouse team it would be unjust to say that the corner case discussions don't contribute to the fixes.
I think part of the reason is that ClickHouse, being sort of a unique offering brings with its users sometimes a quite competent bunch that go beyond the "I want this feature, please implement".
To that point Altinity has contributed about 900 PRs to ClickHouse, and many more if you include ecosystem projects like the operator, clickhouse-backup (which we maintain), the community grafana plugin (over 11M downloads last I looked), ODBC driver, etc. All of this is open source.
We've also been very active on diagnosing problems, logging issues, and contributing ideas for solutions. Alexey Milovidov has logged the most issues of anyone (2376) but the next two people (1012, 810) are from Altinity. The #6 and #9 contributors of issues are also from Altinity.
CEO of Altinity here, also editor of the blog article. It would be more interesting to address the points that it raises. If you are an open source user of ClickHouse, do you really want basic features like object storage for tables or ability to delete data efficiently withheld?
This question is important regardless of who raises it. Projects like Kafka, Spark, PostgreSQL, and Kubernetes (among others) have solved it while allowing good returns to those who contribute.
p.s., We spent 7% of budget on marketing last month. A sizable fraction of our budget is devoted to open source contributions ClickHouse and ecosystem projects.
i appreciate your work here man, altinity even wrote a lot of useful blog posts about clickhouse back then, love those stuff.
this whole thing reminds me of elastic and hashicorp, and it's hard to pick sides given that the core maintainers also worked their assess of building it in public, and the community contributors also put their effort into it.
i think this common theme is unlocking a new era of software where core maintainers productize the main product slapped with a bsl license and the community (incl other businesses) maintain their own fork.
it's great that discussions like this are being brought up and talked about.
Yeah, so, if maintainers started blocking external contributions for business reasons or changed the license of the upstream repo, that would be a legitimate concern.
If this blog post raises any legitimate concerns, I missed them. All I see is entitlement to continue get work for free. If you're concerned about gaps in ClickHouse functionality, maybe pick up the slack and contribute back?
It takes a village to raise a database, as the saying goes.
Ignoring any arguments to the author/company, the big question is "What does ClickHouse do, if there's a PR from the community reimplementing a cloud-only/closed feature?"
The problem with open core is there's no great answer.
Either it's merged, in which case there are now two codebases implementing the same feature (one open, one closed), and the company's revenue stream is imperiled.
Or it's rejected (either explicitly or quietly ignored), in which case work is wasted and the project is less useful than it could be.
How did open core companies historically handle this?
As ugly as it is, it feels like permissive OSS (e.g. MIT) core + open but anti-SaaS non-OSS cloud-only/closed feature is a more sustainable model that encourages development in the open.
E.g. an MIT-alike license for select features that says "free-as-in-beer license up to X users, otherwise talk to our sales team and get a commercial license"
At the end of the day, I want OSS to succeed and be great, but especially nowadays that takes a large team, which takes funding, which requires a competitive revenue model.
"A license that doesn’t allow reselling the service is good enough
For example, I saw this with Tailwind UI"
FYI, we discussed it in this thread https://github.com/ClickHouse/ClickHouse/issues/44767.
For the moment, the only way to get the new features is to use the cloud version (which is likely to be a no-go for most companies managing their own clickhouse infrastructure).
Exactly. I have no problem with open source developers shifting to an open core model so they can get paid, even if it means I as a user don't get these new features for free.
I like what directus does.
They release their source code under a restrictive license (you need to pay if you make more than so much money) with a timer.
After 3 years of any release the license changes to GPL.
There's a kind of survivorship bias here! I'm sure that if Altinity (or others) weren't forced to do RFCs on GitHub like the one mentioned in this article that get low/no attention from the core team, they would be in the top committers :)
"forced to do RFCs on GitHub" The author has raised 1 RFC ever. Have you evidence they attemnpted to contribute more in the past? Proposals to form a steering committee? sponsorship? PRs?
This would be a shame and also a mistake in my opinion.
Clickhouse is instantly differentiated from Snowflake, Databricks, BigQuery and RedShift with the open source offering that you can deploy yourself. There are lots of other options but Clickhouse has the most mindshare and is the techies choice.
I find myself rooting for them and recommending them for that before you even get into any technical comparison.
ClickHoues is also faster than any of them if you know how to use it properly. It helps if you have some distributed systems background and an intuitive feel for map/reduce.
For example ReplacingMergeTree uses a distributed algorithm to process changes without incurssing excessive INSERT time expense. It's quite elegant.
Insert should hav never been expensive in the first place. This was probably hard for clickhouse because they started with postgres as the base which is optimized for oltp. In apache Pinot/druid etc, insert is nothing more than a simple append and believe thats the case today with clickhouse as well... In other words, these things are table stakes today and are not differentiators.
All the main players in Clickhouse's space like Apache Pinot, Apache Druid, StarRocks, PrestoDB all have mindshare and unicorns using their products. It sounds like you haven't seen whats happening in this space.
Presto, created by FB, was required to let any FB engineer merge without OWNERS (because Facebook doesn't have OWNERS files unless it would create a SEV1).
Subsequently, original creators of Presto forked it to PrestoSQL.
As a user of clickhouse since 2018 I'm fully aligned with the content of this article. This technology is one of the best I've been using in my career.
The choice of clickhouse for a new project in my company has always been a no-brainer, but the recent move from clickhouse.inc to a closed source version has made this choice less straightforward.
Anyone familiar with Databend, Starrocks, or ByConity? They all focus on shared storage with separate compute. Currently checking out ByConity. Been using Clickhouse for quite a while and these were on my radar
The inherited advantage of being closed source in the first places is that you will not be accused of “moving away” from open source. We never see Microsoft office or Apple MacOS moving away from open source.
Open Source doesn’t pay. Companies need to make money. Any open source product owned and developed by a for profit company is at danger of it moving away from open source.
If you want open source go fund non profit organisations and/or charities. The fact we don’t see developers do that tells me a lot.
Here is a similar Thin-Crust Open Core model https://reactflow.dev/blog/asking-for-money-for-open-source/. But I hear CH and understand their move. I think OSS Sponsorship has failed; it does not generate enough money to pay a team of top engineers. The best move would have been to implement better paid support model and allow more users to pay for support. Currently CH charges immense amount for their support, so only large corporations could afford.
But if they had a cheaper support model, the large corporations would also pay less, and they'd make even less money overall.
Not to mention: support engineers are expensive, and researching tickets take a lot of time.
Not sure, you can have more customers but fixed number of issues. I am researching into it. You may crowdsource issue solving and have revenue sharing enabled.
Clickhouse open source and cloud user. My understanding is that the cloud version uses S3. Which would mean that they have very specific tenant pattern and code to run that version. This may be why lightweight work a specific way in that environment, or they need a way to test it at scale that would be hard through a feature flag in the open source product. Lightweight deletes were released to both, and previous roadmaps listed updates as upcoming.
You should be using ReplacingMergeTree if you are doing updates at the current moment.
> You should be using ReplacingMergeTree if you are doing updates at the current moment.
Indeed. Altinity and other community users like ContentSquare made numerous contributions to make it more usable. It's a promising approach to updates at scale and has improved markedly over the last few months.
That said you can't currently use RMT very efficiently in S3 because of overall limitations in MergeTree S3 table storage. We need to think about whether the improvements we're proposing will also enhance RMT. Thanks for bringing that up.
Yeah, its the don't treat S3 like a disk issue. I'm looking at S3 only for cold storage but need AWS VPC Gateway Endpoint support for S3 access since we are on-premise.
*Yes you can use a vpc gateway but need public IPs to waste to setup the BGP/IP routes.
I do not think ClickHouse Inc will Abandon to Open Source, but following current trend it may focus on restricting Innovations to their proprietary cloud only product.
We see it with Oracle (MySQL) where most of innovations is happening in cloud only "Heatwave" or MongoDB where MongoDB Atlas increasingly getting features not available in their Community (SSPL) version
They are both columnar data stores and while they solve the same problem I wouldn't use them in the same situation. DuckDB is often referred as the sqlite of analytics, meaning that it's lightweight and you can embed it. On the other hand ClickHouse is definitely the way to go if you need to distribute your queries over multiple servers.
If your workload can be held on a single server and you only need standard SQL functions both will serve you well. If you have more specific needs maybe you should have a look at the documentation. For example ClickHouse has a very extensive support for nested arrays which can prove quite useful.
Duckdb has also gotten mindshare as an engine to read Parquet from data lakes. The fact that it's embeddable enables some very creative uses. It helped that for a time DuckDB was substantially quicker than ClickHouse on reading Parquet. That advantage has eroded with recent improvements on ClickHouse Parquet support. I expect the gap will close quickly.
Scale. DuckDB chokes at a certain point (just like sqlite isn't the same as mysql or postgresql in terms of scalability). That's why they're building a better/bigger version.
Different beasts, but if by any chance you love ClickHouse already and just want to run OLAP queries in-process, there's chdb: https://github.com/chdb-io/chdb
They solve the same problem in that they are OLAP data stores, but that's where the similarity ends. Clickhouse is a centralised OLAP store (like 10s of others) whilst DuckDB is an embedded database that is usually ran in process.
What is it about DuckDB and it's strange cult like following? It's nice that it's in process, but then it's an incremental improvement over Pandas. Nice tool and well implemented but I don't see what is transformative about it.
The blog author founded altinity. Altinity's main product offering is a hosted clickhouse service. The top 10 committers to clickhouse all seem to be clickhouse employees. Looking at altinity on github, they contribute much less open source. If clickhouse the company are spending 40%+ of their money to build the product, then others including altinity spend 5% dev and 80% marketing, they will get more customers. That isn't sustainable. How do you solve that? Other than fencing off exclusive enterprise features.
Altinity maintains the ClickHouse operator which got a lot of people to use ClickHouse to begin with. ClickHouse has had a lot of corner cases that were reported from those kinds of people, myself included.
If you look at some of the discussions, while a lot of the fixes come from the clickHouse team it would be unjust to say that the corner case discussions don't contribute to the fixes.
I think part of the reason is that ClickHouse, being sort of a unique offering brings with its users sometimes a quite competent bunch that go beyond the "I want this feature, please implement".
To that point Altinity has contributed about 900 PRs to ClickHouse, and many more if you include ecosystem projects like the operator, clickhouse-backup (which we maintain), the community grafana plugin (over 11M downloads last I looked), ODBC driver, etc. All of this is open source.
We've also been very active on diagnosing problems, logging issues, and contributing ideas for solutions. Alexey Milovidov has logged the most issues of anyone (2376) but the next two people (1012, 810) are from Altinity. The #6 and #9 contributors of issues are also from Altinity.
1 reply →
CEO of Altinity here, also editor of the blog article. It would be more interesting to address the points that it raises. If you are an open source user of ClickHouse, do you really want basic features like object storage for tables or ability to delete data efficiently withheld?
This question is important regardless of who raises it. Projects like Kafka, Spark, PostgreSQL, and Kubernetes (among others) have solved it while allowing good returns to those who contribute.
p.s., We spent 7% of budget on marketing last month. A sizable fraction of our budget is devoted to open source contributions ClickHouse and ecosystem projects.
i appreciate your work here man, altinity even wrote a lot of useful blog posts about clickhouse back then, love those stuff.
this whole thing reminds me of elastic and hashicorp, and it's hard to pick sides given that the core maintainers also worked their assess of building it in public, and the community contributors also put their effort into it.
i think this common theme is unlocking a new era of software where core maintainers productize the main product slapped with a bsl license and the community (incl other businesses) maintain their own fork.
it's great that discussions like this are being brought up and talked about.
7 replies →
Yeah, so, if maintainers started blocking external contributions for business reasons or changed the license of the upstream repo, that would be a legitimate concern.
If this blog post raises any legitimate concerns, I missed them. All I see is entitlement to continue get work for free. If you're concerned about gaps in ClickHouse functionality, maybe pick up the slack and contribute back?
It takes a village to raise a database, as the saying goes.
Ignoring any arguments to the author/company, the big question is "What does ClickHouse do, if there's a PR from the community reimplementing a cloud-only/closed feature?"
The problem with open core is there's no great answer.
Either it's merged, in which case there are now two codebases implementing the same feature (one open, one closed), and the company's revenue stream is imperiled.
Or it's rejected (either explicitly or quietly ignored), in which case work is wasted and the project is less useful than it could be.
How did open core companies historically handle this?
As ugly as it is, it feels like permissive OSS (e.g. MIT) core + open but anti-SaaS non-OSS cloud-only/closed feature is a more sustainable model that encourages development in the open.
E.g. an MIT-alike license for select features that says "free-as-in-beer license up to X users, otherwise talk to our sales team and get a commercial license"
At the end of the day, I want OSS to succeed and be great, but especially nowadays that takes a large team, which takes funding, which requires a competitive revenue model.
AFAIK, the author is indeed trying to contribute back, that is the whole point: https://github.com/ClickHouse/ClickHouse/issues/54644
4 replies →
"A license that doesn’t allow reselling the service is good enough For example, I saw this with Tailwind UI" FYI, we discussed it in this thread https://github.com/ClickHouse/ClickHouse/issues/44767. For the moment, the only way to get the new features is to use the cloud version (which is likely to be a no-go for most companies managing their own clickhouse infrastructure).
MongoDB did something similar. It's open source for you to extend and host yourself but you can't build a cloud service for it.
Exactly. I have no problem with open source developers shifting to an open core model so they can get paid, even if it means I as a user don't get these new features for free.
I like what directus does. They release their source code under a restrictive license (you need to pay if you make more than so much money) with a timer. After 3 years of any release the license changes to GPL.
1 reply →
There's a kind of survivorship bias here! I'm sure that if Altinity (or others) weren't forced to do RFCs on GitHub like the one mentioned in this article that get low/no attention from the core team, they would be in the top committers :)
"forced to do RFCs on GitHub" The author has raised 1 RFC ever. Have you evidence they attemnpted to contribute more in the past? Proposals to form a steering committee? sponsorship? PRs?
3 replies →
A license that doesn’t allow reselling the service is good enough
For example I saw this with Tailwind UI
This would be a shame and also a mistake in my opinion.
Clickhouse is instantly differentiated from Snowflake, Databricks, BigQuery and RedShift with the open source offering that you can deploy yourself. There are lots of other options but Clickhouse has the most mindshare and is the techies choice.
I find myself rooting for them and recommending them for that before you even get into any technical comparison.
ClickHoues is also faster than any of them if you know how to use it properly. It helps if you have some distributed systems background and an intuitive feel for map/reduce.
For example ReplacingMergeTree uses a distributed algorithm to process changes without incurssing excessive INSERT time expense. It's quite elegant.
Insert should hav never been expensive in the first place. This was probably hard for clickhouse because they started with postgres as the base which is optimized for oltp. In apache Pinot/druid etc, insert is nothing more than a simple append and believe thats the case today with clickhouse as well... In other words, these things are table stakes today and are not differentiators.
1 reply →
All the main players in Clickhouse's space like Apache Pinot, Apache Druid, StarRocks, PrestoDB all have mindshare and unicorns using their products. It sounds like you haven't seen whats happening in this space.
Trino, not Presto.
Presto, created by FB, was required to let any FB engineer merge without OWNERS (because Facebook doesn't have OWNERS files unless it would create a SEV1).
Subsequently, original creators of Presto forked it to PrestoSQL.
So Facebook trademarked the name Presto.
So creators renamed it Trino.
https://trino.io/blog/2020/12/27/announcing-trino.html
As a user of clickhouse since 2018 I'm fully aligned with the content of this article. This technology is one of the best I've been using in my career.
The choice of clickhouse for a new project in my company has always been a no-brainer, but the recent move from clickhouse.inc to a closed source version has made this choice less straightforward.
Lots of other options in this space. https://atwong.medium.com/top-open-source-alternatives-to-ol...
Anyone familiar with Databend, Starrocks, or ByConity? They all focus on shared storage with separate compute. Currently checking out ByConity. Been using Clickhouse for quite a while and these were on my radar
The inherited advantage of being closed source in the first places is that you will not be accused of “moving away” from open source. We never see Microsoft office or Apple MacOS moving away from open source.
MacOS (X) was once more open than it is now. It is a shame they have locked more away behind a closed license.
Open Source doesn’t pay. Companies need to make money. Any open source product owned and developed by a for profit company is at danger of it moving away from open source.
If you want open source go fund non profit organisations and/or charities. The fact we don’t see developers do that tells me a lot.
Here is a similar Thin-Crust Open Core model https://reactflow.dev/blog/asking-for-money-for-open-source/. But I hear CH and understand their move. I think OSS Sponsorship has failed; it does not generate enough money to pay a team of top engineers. The best move would have been to implement better paid support model and allow more users to pay for support. Currently CH charges immense amount for their support, so only large corporations could afford.
But if they had a cheaper support model, the large corporations would also pay less, and they'd make even less money overall. Not to mention: support engineers are expensive, and researching tickets take a lot of time.
Not sure, you can have more customers but fixed number of issues. I am researching into it. You may crowdsource issue solving and have revenue sharing enabled.
Clickhouse open source and cloud user. My understanding is that the cloud version uses S3. Which would mean that they have very specific tenant pattern and code to run that version. This may be why lightweight work a specific way in that environment, or they need a way to test it at scale that would be hard through a feature flag in the open source product. Lightweight deletes were released to both, and previous roadmaps listed updates as upcoming.
You should be using ReplacingMergeTree if you are doing updates at the current moment.
> You should be using ReplacingMergeTree if you are doing updates at the current moment.
Indeed. Altinity and other community users like ContentSquare made numerous contributions to make it more usable. It's a promising approach to updates at scale and has improved markedly over the last few months.
That said you can't currently use RMT very efficiently in S3 because of overall limitations in MergeTree S3 table storage. We need to think about whether the improvements we're proposing will also enhance RMT. Thanks for bringing that up.
Yeah, its the don't treat S3 like a disk issue. I'm looking at S3 only for cold storage but need AWS VPC Gateway Endpoint support for S3 access since we are on-premise.
*Yes you can use a vpc gateway but need public IPs to waste to setup the BGP/IP routes.
3 replies →
> Lightweight deletes were released to both
Because they were contributed by community member, not ClickHouse Inc core team.
https://github.com/ClickHouse/ClickHouse/pull/37893
Commits show both, yes a community member kicked things over with the initial PR and seems like a team effort to get that feature launched.
Here's a list of other open source OLAP systems out there. Clickhouse is on the list along with others like StarRocks. https://atwong.medium.com/top-open-source-alternatives-to-ol...
I do not think ClickHouse Inc will Abandon to Open Source, but following current trend it may focus on restricting Innovations to their proprietary cloud only product.
We see it with Oracle (MySQL) where most of innovations is happening in cloud only "Heatwave" or MongoDB where MongoDB Atlas increasingly getting features not available in their Community (SSPL) version
How dare they go open core to make money from their own product.
New column store alternative : https://news.ycombinator.com/item?id=37571974
Same thing with timescale. I think their s3 storage layer is only in their cloud version.
How do you chose between ClickHouse and DuckDB ? It feels like they solve the same problem
They are both columnar data stores and while they solve the same problem I wouldn't use them in the same situation. DuckDB is often referred as the sqlite of analytics, meaning that it's lightweight and you can embed it. On the other hand ClickHouse is definitely the way to go if you need to distribute your queries over multiple servers. If your workload can be held on a single server and you only need standard SQL functions both will serve you well. If you have more specific needs maybe you should have a look at the documentation. For example ClickHouse has a very extensive support for nested arrays which can prove quite useful.
Duckdb has also gotten mindshare as an engine to read Parquet from data lakes. The fact that it's embeddable enables some very creative uses. It helped that for a time DuckDB was substantially quicker than ClickHouse on reading Parquet. That advantage has eroded with recent improvements on ClickHouse Parquet support. I expect the gap will close quickly.
also clickhouse-local exists...https://clickhouse.com/docs/en/operations/utilities/clickhou...
FWIW, you can checkout clickbench.com is a benchmark of parquet, partitioned of ClickHouse and DuckDB
Scale. DuckDB chokes at a certain point (just like sqlite isn't the same as mysql or postgresql in terms of scalability). That's why they're building a better/bigger version.
Different beasts, but if by any chance you love ClickHouse already and just want to run OLAP queries in-process, there's chdb: https://github.com/chdb-io/chdb
They solve the same problem in that they are OLAP data stores, but that's where the similarity ends. Clickhouse is a centralised OLAP store (like 10s of others) whilst DuckDB is an embedded database that is usually ran in process.
What is it about DuckDB and it's strange cult like following? It's nice that it's in process, but then it's an incremental improvement over Pandas. Nice tool and well implemented but I don't see what is transformative about it.
ClickHouse power is to have one binary that runs anywhere :
- local - server - cloud (*) - serverless - in-process https://github.com/chdb-io/chdb similar to DuckDB
(*) except for the forked cloud versions, ClickHouse Inc, Huawei, etc ...
[dead]
[dead]