PG JSON write operations are document level whereas with Mongodb it's field level.
Would you use a DB that only let you write an entire row instead of setting a single field? Race conditions galore. Be very careful choosing PG for JSON in production systems...
Why? I felt the same for a while but it’s really massively improved over the years. Yes, this is a bad vuln but anyone with even. tiny bit of brain is not running mongo on the internet.. I’m using mongo very successfully at the moment in ways i could not use postgres.
My application's primary task is to move JSON objects between storage and front-end. It does a lot more, but that's it's primary task. So document storage is a logical choice. There are no real reasons to join records, although it sometimes is more efficient to do so. MongoDB's join operation has one advantage (for 1:N relations): it groups the joined records as an array in the document, instead of multiplying the answers, so whatever function operates on the original data, also works on the joined data. The data itself is hierarchical in nature, so back-end operations also preferably work on structured data instead of rows.
You can argue that you can imitate that in Postgres or even SQLite by storing in JSON fields, but there are things they can't do quite as efficiently (e.g. indexing array contents); storage itself isn't very efficient either. But ignoring that, there's no functional difference: it's document in, document out. So then the choice boils down to speed, memory usage, etc. One day I'm going to check if Postgresql offers a real performance advantage, but given the backlog, that may take a while. Until then, MongoDB just works.
I consult for a small company which feeds some of the largest market research companies. This company finds data providers for each country, collect the data monthly and need to massage it into a uniform structure before handing it over. I help them scripting this. I found importing the monthly spreadsheets into mongodb and querying the set can replace an awful lot of manual scripting work. That aggregator queries are a good fit for an aggregator company shouldn't be that big of a surprise, I guess.
The mongodb instance is ephemeral, the database itself is ephemeral, both only exist while the script is running which can be measured in seconds. The structure is changing from month to month. All this plays to the strengths of mongodb while avoiding the usual problems. For eg one stage of the aggregate pipeline can only be 100MB? A source csv is a few megabytes at most.
Ps.: no, Excel can't do it, I got involved with this when the complexity to do it in Excel has become unbearable.
To be honest, I don't think it was a stand-out 'it's better for X than Y because of Z' kind of choice for us. We are a bank, and so database options are quite limited (it's Oracle or Mongo, essentially for certain applications).
I have one application at the moment which needs to handle about 175k writes/second across AZ's. We are not sharding at the moment, but probably will once scale requires (we are getting close) -- so just one big replica-set and it's behaving .. really nicely. I tried to emulate this workload on Postgres (which is my favourite database over my entire career so far (many scars)) and we couldn't get it to where mongo was for this workload, multi-az is painful, automatic failover is still an unanswered question really, I've tried all the 'right around the corner' multi-master Postgres options and none of them did anything other than make us sad.
From the developer standpoint, it's very nice to use, I just throw documents at it and it saves them. If I want an extra field, I just add it. If I want an index on something, also just add it. No big complicated schema migrations.
Especially what helps is we have absolutely incredibly great support from MongoDB. We have a _weekly_ call with them with a bunch of their senior engineers who answer all our stupid questions and proactively look for things to improve.
Ops story is also good, we aren't using Atlas, but the on-prem kube setup while a bit clunky has enough CRDs and whatever to keep devops happy for weeks at a time.
tl;dr -- it's boring and predictable, and I rarely have to think about it which is all I ever want from a database. I'm sure we could achieve the same results with other database technologies, but the ROI on even investigating them would not be worth it, as at best I think we would end up at the same place we are at now. People seem to have deeply religious feelings on databases, but I've never really been one of them.
And can you explain why? I think not. What's the superior alternative, for every use case?
https://www.manning.com/books/just-use-postgres
PG JSON write operations are document level whereas with Mongodb it's field level.
Would you use a DB that only let you write an entire row instead of setting a single field? Race conditions galore. Be very careful choosing PG for JSON in production systems...
Why? I felt the same for a while but it’s really massively improved over the years. Yes, this is a bad vuln but anyone with even. tiny bit of brain is not running mongo on the internet.. I’m using mongo very successfully at the moment in ways i could not use postgres.
Genuinely interested: what problems does mongo fit better than mainstream competitors these days? Why would you use it on a new project?
My application's primary task is to move JSON objects between storage and front-end. It does a lot more, but that's it's primary task. So document storage is a logical choice. There are no real reasons to join records, although it sometimes is more efficient to do so. MongoDB's join operation has one advantage (for 1:N relations): it groups the joined records as an array in the document, instead of multiplying the answers, so whatever function operates on the original data, also works on the joined data. The data itself is hierarchical in nature, so back-end operations also preferably work on structured data instead of rows.
You can argue that you can imitate that in Postgres or even SQLite by storing in JSON fields, but there are things they can't do quite as efficiently (e.g. indexing array contents); storage itself isn't very efficient either. But ignoring that, there's no functional difference: it's document in, document out. So then the choice boils down to speed, memory usage, etc. One day I'm going to check if Postgresql offers a real performance advantage, but given the backlog, that may take a while. Until then, MongoDB just works.
I consult for a small company which feeds some of the largest market research companies. This company finds data providers for each country, collect the data monthly and need to massage it into a uniform structure before handing it over. I help them scripting this. I found importing the monthly spreadsheets into mongodb and querying the set can replace an awful lot of manual scripting work. That aggregator queries are a good fit for an aggregator company shouldn't be that big of a surprise, I guess.
The mongodb instance is ephemeral, the database itself is ephemeral, both only exist while the script is running which can be measured in seconds. The structure is changing from month to month. All this plays to the strengths of mongodb while avoiding the usual problems. For eg one stage of the aggregate pipeline can only be 100MB? A source csv is a few megabytes at most.
Ps.: no, Excel can't do it, I got involved with this when the complexity to do it in Excel has become unbearable.
2 replies →
To be honest, I don't think it was a stand-out 'it's better for X than Y because of Z' kind of choice for us. We are a bank, and so database options are quite limited (it's Oracle or Mongo, essentially for certain applications).
I have one application at the moment which needs to handle about 175k writes/second across AZ's. We are not sharding at the moment, but probably will once scale requires (we are getting close) -- so just one big replica-set and it's behaving .. really nicely. I tried to emulate this workload on Postgres (which is my favourite database over my entire career so far (many scars)) and we couldn't get it to where mongo was for this workload, multi-az is painful, automatic failover is still an unanswered question really, I've tried all the 'right around the corner' multi-master Postgres options and none of them did anything other than make us sad.
From the developer standpoint, it's very nice to use, I just throw documents at it and it saves them. If I want an extra field, I just add it. If I want an index on something, also just add it. No big complicated schema migrations.
Especially what helps is we have absolutely incredibly great support from MongoDB. We have a _weekly_ call with them with a bunch of their senior engineers who answer all our stupid questions and proactively look for things to improve.
Ops story is also good, we aren't using Atlas, but the on-prem kube setup while a bit clunky has enough CRDs and whatever to keep devops happy for weeks at a time.
tl;dr -- it's boring and predictable, and I rarely have to think about it which is all I ever want from a database. I'm sure we could achieve the same results with other database technologies, but the ROI on even investigating them would not be worth it, as at best I think we would end up at the same place we are at now. People seem to have deeply religious feelings on databases, but I've never really been one of them.
I would not hesitate to use it on a new project.
2 replies →