Analyzing database trends through 1.8M Hacker News headlines

6 days ago (camelai.com)

MS Sql Server not even mentioned. This tells us there is a whole world almost totally omitted from discussion on HN: "Enterprise"

  • Oracle isn't in there either, which goes to show how much of a bubble HN actually is considering MSSQL and Oracle are #1 and #2 in market share.

    • Well, if you analyze programming language trends through 1.8M Hacker News headlines you’d find Rust is the most popular language and C/C++ are barely even used.

    • I used MS SQL and Oracle at my last job, but what's there to say about them? They've been around forever, are stable and get all the same table-stakes feature updates as everyone else. Start-ups avoid them like the plague because they're so damn expensive, you won't be running either on your phone or an embedded device like SQLite either.

      6 replies →

    • I would not call HN a bubble, Enterprises often have unqualified people making "expensive" decisions.

    • They are perhaps #1 and #2 in the "enterprise" market share, but in no way are they overall #1 and #2. Not even close. Which web app or startup uses them?

      6 replies →

  • > This tells us there is a whole world almost totally omitted from discussion on HN

    It doesn't though, all it tells you is that it's missing from the headlines in the submissions.

    "Enterprise" is discussed on HN too, but inside submissions that aren't exclusively about MS Sql Server. Try searching for some terms on the Algolia HN search, order by date and filter by comments and you'll find the subthreads/submissions where it's discussed :)

There's an online playground with the data here: https://play.clickhouse.com/

Wrote up this query:

  SELECT
    db_name,
    sum(if(type = 'comment', 1, 0)) AS comment_mentions,
    sum(if(type = 'story', 1, 0)) AS post_mentions,
    count(*) AS total_mentions,
    sum(score) as total_score
  FROM hackernews
  ARRAY JOIN
    extractAll(replaceAll(LOWER(text), ' ', ''), '(sqlite|postgres|mysql|mongodb|redis|clickhouse|mariadb|oracle|sqlserver|duckdb)') AS db_name
  WHERE toYear(time) >= 2022
  GROUP BY
    db_name
  ORDER BY
    post_mentions DESC;

More unsolicited feedback: Month-by-month is kind of noisy. You might do 3 month average to smooth it a little and make the trend clearer.

Is MariaDB included in MySQL? I see no mention of it in the post, but MySQL trending downwards would make sense as people upgrade and switch over. Besides of course novelty wearing off as posited for all engines further down the post

  • > Is MariaDB included in MySQL?

    I was wondering the same, but I'm not sure if it would make a major change in the graphs. MySQL and MariaDB have both been unpopular on Hacker News for many years. Submissions on either topic rarely get much traction, which then leads to fewer submissions.

    > MySQL trending downwards would make sense as people upgrade and switch over.

    No, most large MySQL users are still using MySQL; there hasn't been a widespread migration to MariaDB. They're both actively developed and have grown in slightly different directions. Among corporations, MySQL's usage still far outstrips MariaDB by a significant degree. Lately MariaDB has better product velocity though, and their commercial enterprise finally seems to have stable footing.

    • > there hasn't been a widespread migration to MariaDB

      I don't think I even knew I was running MariaDB at first, or perhaps more as a side note that I saw it dropping in mariadb when I apt installed mysql. If you upgraded Debian some time ago, I'm pretty certain you were automatically migrated, so anyone running that (or, presumably, one of the derivatives like Ubuntu) would have migrated knowingly or unknowingly, hence my assumption

      1 reply →

  • is anyone seriously using it? even their own brand facepile is pretty weak

Sqlite seems to be growing recently which matches my perception, but it‘s not listed among the growing databases. Weird.

UPDATE: Added a weighted average analysis based on story points and comments. SQLite ranks highest in points per story and Redis ranks highest in comments per post. Also added SQLite to the growth table. I had accidentally deleted this row in the original post.

Snowflake seems to have peaked; 2023 was hellish dealing with roomfuls of inexperienced devs and even architects convinced it was the fastest cheapest thing ever.

  • Well as pointed out above since Oracle and SQL Server don't even show up.. this simply does not reflect enterprise and Snowflake and Eatabricks both lean Enterprise

The data query tool linked at the bottom of the post doesn't work for me. Cloudflare shows error 600010, whatever that means. Nice that there is "no login required" but if it did, or allowed that option, maybe it wouldn't need an algorithm to decide whether my traffic is abusive because you could block abusive accounts instead

That's really interesting; I knew postgres was the most popular database on here, but also looking at that chart, SQLite had a burst of popularity on HN last year.

Interesting to see SQL Server not listed here, am curious whether it didn't have enough signal, or suffered from being a two-word product, with "SQL" being far too generic on its own.

  • I’ve also don’t remember SAP HANA, Oracle, or DB2 mentioned even once here but believe me, along with MSSQL these occupy most of the top ten database deployments world wide.

    Something that I’ve been thinking about a lot recently is that all of the proprietary vendors are quietly strangling their flagship products.

    Free and open source database engines were always “nipping at their heels” but weren’t a serious threat for decades. Only other proprietary engines were.

    Now that PostgreSQL has more features than SQL Server and better performance, it’s a serious competitor.

    But Microsoft is holding MSSQL’s face under water with core-based licensing. It means that per dollar you get dozens of times less compute available for your data than with open-source systems. That ratio is growing exponentially, because they haven’t redone their pricing in… ever.

    Oracle and DB2 are being similarly choked off at the same rate, so looking left and right at their direct competition their respective product managers haven’t noticed the problem, which is akin to Fuji and Kodak raising film prices in lockstep just as digital photography is taking off.

    We’re entering the era of “kilocores”: single servers becoming available that have over a thousand cores. You can’t imagine what per-core licensing costs for something like that!

    PS: I saw a similar dynamic play out in the network space with load balancers and “web accelerators” like NetScaler sold “by bandwidth” with a starter SKU as small as 2 Mbps. I kept trying to politely explain to the reps that the smallest cloud VMs can cheerfully put out 10 Gbps, and hence their product is a 500x decelerator. They eventually listened to someone and made it bandwidth-unlimited. Too late. Everyone uses NGINX now.

  • It is also less mentioned on the site in general, owing to it being a proprietary Microsoft product in an audience of people who primarily go for Free / Open Source non-Microsoft products.

    There are some people here who are interested in corporate Europe or <insert Microsoft foothold place/industry here>, but most are aligned with Silicon Valley hackers.

    • Someone else mentioned it already, but what is there to talk about with SQL Server (and Oracle)? Like I'm sure there's plenty someone could write about but generally it's pay Microsoft so it's their problem.

      Whereas something like Postgres has a plethora of forks and tools built around it, because it's open source devs can actually do interesting things to solve their problems.

Is it weird or just me that bigquery is mentioned, but bigtable and spanner are not? The article presents a grab-bag of database concepts that do not seem related. BigQuery and PostgreSQL are just fundamentally different things.

It all makes me wonder what is the biggest "dark" database, the one nobody on HN wants to talk about, but it's out there serving the most transactions.

I almost knew that postgresql would be the winner just because of how much people recommend it here or literally anywhere. Postgres is cool.

My personal favourite depending on situations are postgres (technically supabase is postgres too),sqlite,duckdb,(valkey?)

I am just curious but guys what are your favourite options and why?

the funniest thing about this graph is that it proves there was a raw drop off in all popularities in the last 2 years, which of course directly coincides with the great layoffening that has been happening for almost 3 years now.

this shows that people are definitely rotating out of "web technologies" in general, not because they aren't useful, but because the money isn't there anymore.

perhaps a large chunk have switched to AI hype trains, and it would be interesting to compare raw results of different AI headlines, but i suspect maybe 30% of people have left tech all together.

  • I think it's attention and mindshare going to AI

    • we would have to look at raw numbers, like, perhaps web tech is just "flat", not declining.

      but my suspicion without evidence is that the gross number of people in the industry is actually dropping, though it should be increasing.

      3 replies →

>a ClickHouse database of every HN story

I remember downloading it a few years ago, but the bookmark I have is dead. where is it now? is it still public?

Any commentary on DuckDB from users? I keep hearing about it but am not a user myself. Is it a fad or here to stay?

Would be great to share the queries. Are these results weighted for storypoints and/or number of comments?

Confusingly, I just came across the unrelated https://www.camel-ai.org/ today.

Some of the insights match my personal experience and preferences. At $dayjob we're migrating from Mongo to TimescaleDB (now TigerData ¯\_(ツ)_/¯) which is basically a PostgreSQL extension for time series data and couldn't be happier. We are getting better performance and massive storage savings.

On the analytics side of things we are starting to use DuckDB for some development efforts, but we are keen on potentially replacing some or all of our Snowflake usage with DuckDB.

  • Can you tell me, the scenarios you used MongoDB for? Because I'm still curious about why would anyone use MongoDB after all these years.

    • It is the main database for a huge Rails app. They adopted Mongo right when its popularity started to decline. I always thought it was a very poor choice since the day I joined.

      It is a especially bad choice considering that a lot of the data stored in it is IoT-like and the system creates a single document per event :facepalm:

      1 reply →

    • > I'm still curious about why would anyone use MongoDB after all these years.

      Because MongoDB is webscale.

Absolute drivel. Comparing operational/transactional databases like MongoDB and Postgres to analytics / columnar datastores like Redshift and Snowflake is meaningless. You might as well as say "...the popularity of hammers is way up, with screwdrivers appearing to be in decline..". If this is the type of data analysis that AI is supporting, we're all in trouble.