Comment by jjice

9 months ago

I believe it’s either released now or at least a feature flag (maybe only some systems). It’s absolutely absurd it took so long. I can’t believe it wasn’t the initial implementation.

Funny relevant story: we got an OOM from a query that we used Prisma for. I looked into it - it’s was a simple select distinct. Turns out (I believe it was changed like a year ago, but I’m not positive), event distincts were done in memory! I can’t fathom the decision making there…

9 comments

jjice

etblg 9 months ago

> event distincts were done in memory! I can’t fathom the decision making there…

This is one of those situations where I can't tell if they're operating on some kind of deep insight that is way above my experience and I just don't understand it, or if they just made really bad decisions. I just don't get it, it feels so wrong.

Tadpole9181 9 months ago
> I can't tell if they're operating on some kind of deep insight that is way above my experience and I just don't understand it
This is answered at the very top of the link on the post you replied to. In no unclear language, no less. Direct link here: https://github.com/prisma/prisma/discussions/19748#discussio...
> I want to elaborate a bit on the tradeoffs of this decision. The reason Prisma uses this strategy is because in a lot of real-world applications with large datasets, DB-level JOINs can become quite expensive...
> The total cost of executing a complex join is often higher than executing multiple simpler queries. This is why the Prisma query engine currently defaults to multiple simple queries in order to optimise overall throughput of the system.
> But Prisma is designed towards generalized best practices, and in the "real world" with huge tables and hundreds of fields, single queries are not the best approach...
> All that being said, there are of course scenarios where JOINs are a lot more performance than sending individual queries. We know this and that's why we are currently working on enabling JOINs in Prisma Client queries as well You can follow the development on the roadmap.
Though this isn't a complete answer still. Part of it is that Prisma was, at its start, a GraphQL-centric ORM. This comes with its own performance pitfalls, and decomposing joins into separate subqueries with aggregation helped avoid them.
- evanelias 9 months ago
  
  It's a completely ridiculous answer though. They're linking to High Performance MySQL's 2nd edition, which came out in June 2008, and was written for users of MySQL 5.0 running on 2008-era hardware.
  My take, as a MySQL expert: that advice is totally irrelevant now, and has been for quite some time. It's just plain wrong in a modern context.
  
  3 replies →
wredcoll 9 months ago

It really gives me flashbacks to the early days of mongodb.
Which, frankly, is a good lesson that marketing and docs and hype can make up for any amount of technical failure, and if you live long enough, you can fix the tech issues.
MonkeyClub 9 months ago

> if they're operating on some kind of deep insight
If one's getting OOM errors from a SELECT DISTINCT, then there's no deep insight behind the choice, it's just a mistake.

seer 9 months ago

Tbh, I once dabbled in building an ORM myself (in PHP) and I did find that in some situations it was faster to do individual queries and then join in code, to solve the N+1 problem.

Granted I was much worse in my sql knowledge and postgre/mysql had severe limitations in their query planners, so I can see how something like this could have happened. If they support multiple dbs, and even one has this problem, it might be better (for them) to do it application side.

The specific issue was doing a join with a table for a one to many, you get a lot more data from the db than you would normally need, if you do the join the naive way, and if the join is nested you get exponentially more data.

It was faster to do a query for each db separately and then stitch the results.

Now it is easy to solve in pg with nested selects and json aggregation, which pg query planner rewrites to efficient joins, but you still get only the bytes you have requested without duplication.