Comment by wood_spirit

2 months ago

Genuinely interested: what problems does mongo fit better than mainstream competitors these days? Why would you use it on a new project?

11 comments

wood_spirit

tgv 2 months ago

My application's primary task is to move JSON objects between storage and front-end. It does a lot more, but that's it's primary task. So document storage is a logical choice. There are no real reasons to join records, although it sometimes is more efficient to do so. MongoDB's join operation has one advantage (for 1:N relations): it groups the joined records as an array in the document, instead of multiplying the answers, so whatever function operates on the original data, also works on the joined data. The data itself is hierarchical in nature, so back-end operations also preferably work on structured data instead of rows.

You can argue that you can imitate that in Postgres or even SQLite by storing in JSON fields, but there are things they can't do quite as efficiently (e.g. indexing array contents); storage itself isn't very efficient either. But ignoring that, there's no functional difference: it's document in, document out. So then the choice boils down to speed, memory usage, etc. One day I'm going to check if Postgresql offers a real performance advantage, but given the backlog, that may take a while. Until then, MongoDB just works.

BoorishBears 2 months ago

How is that problem not solved by json aggregation? You don't have to store the data as json then?

sandblast2 2 months ago

I consult for a small company which feeds some of the largest market research companies. This company finds data providers for each country, collect the data monthly and need to massage it into a uniform structure before handing it over. I help them scripting this. I found importing the monthly spreadsheets into mongodb and querying the set can replace an awful lot of manual scripting work. That aggregator queries are a good fit for an aggregator company shouldn't be that big of a surprise, I guess.

The mongodb instance is ephemeral, the database itself is ephemeral, both only exist while the script is running which can be measured in seconds. The structure is changing from month to month. All this plays to the strengths of mongodb while avoiding the usual problems. For eg one stage of the aggregate pipeline can only be 100MB? A source csv is a few megabytes at most.

Ps.: no, Excel can't do it, I got involved with this when the complexity to do it in Excel has become unbearable.

solatic 2 months ago

duckdb wouldn't help?
https://duckdb.org/docs/stable/data/csv/overview
https://duckdb.org/docs/stable/sql/functions/aggregates
cpursley 2 months ago
Postgres has jsonb helper functions for this.

cyberpunk 2 months ago

To be honest, I don't think it was a stand-out 'it's better for X than Y because of Z' kind of choice for us. We are a bank, and so database options are quite limited (it's Oracle or Mongo, essentially for certain applications).

I have one application at the moment which needs to handle about 175k writes/second across AZ's. We are not sharding at the moment, but probably will once scale requires (we are getting close) -- so just one big replica-set and it's behaving .. really nicely. I tried to emulate this workload on Postgres (which is my favourite database over my entire career so far (many scars)) and we couldn't get it to where mongo was for this workload, multi-az is painful, automatic failover is still an unanswered question really, I've tried all the 'right around the corner' multi-master Postgres options and none of them did anything other than make us sad.

From the developer standpoint, it's very nice to use, I just throw documents at it and it saves them. If I want an extra field, I just add it. If I want an index on something, also just add it. No big complicated schema migrations.

Especially what helps is we have absolutely incredibly great support from MongoDB. We have a _weekly_ call with them with a bunch of their senior engineers who answer all our stupid questions and proactively look for things to improve.

Ops story is also good, we aren't using Atlas, but the on-prem kube setup while a bit clunky has enough CRDs and whatever to keep devops happy for weeks at a time.

tl;dr -- it's boring and predictable, and I rarely have to think about it which is all I ever want from a database. I'm sure we could achieve the same results with other database technologies, but the ROI on even investigating them would not be worth it, as at best I think we would end up at the same place we are at now. People seem to have deeply religious feelings on databases, but I've never really been one of them.

I would not hesitate to use it on a new project.

aschen 2 months ago
> From the developer standpoint, it's very nice to use, I just throw documents at it and it saves them. If I want an extra field, I just add it. If I want an index on something, also just add it. No big complicated schema migrations.
This sentence summarize all the issues developers working with Mongo will have: multiple version of documents living in the same DB and unpredictable structure
Best thing MongoDB have it's definitely their marketing (making everyone think it's amazing to invest hundreds of millions to deliver an "OK" tier database) and their customer support
- cyberpunk 2 months ago
  
  Eh, not really. I've done both at considerable scale, and I don't hit these problems. Perhaps you need better developers? For sure, having your database enforce guardrails on what $thing should look like means your code can be lower quality, but you should pick the right tool for the job. For scenarios where I have one 'thing' that's not very relational, it works well. If your application dies because your $thing expects some field which isn't there, that's a you problem not a storage problem.
  
  2 replies →