Comment by etblg

9 months ago

> It's true that Prisma currently doesn't do JOINs for relational queries. Instead, it sends individual queries and joins the data on the application level.

..........I'm sorry, what? That seems........absurd.

edit: Might as well throw in: I can't stand ORMs, I don't get why people use it, please just write the SQL.

I believe it’s either released now or at least a feature flag (maybe only some systems). It’s absolutely absurd it took so long. I can’t believe it wasn’t the initial implementation.

Funny relevant story: we got an OOM from a query that we used Prisma for. I looked into it - it’s was a simple select distinct. Turns out (I believe it was changed like a year ago, but I’m not positive), event distincts were done in memory! I can’t fathom the decision making there…

  • > event distincts were done in memory! I can’t fathom the decision making there…

    This is one of those situations where I can't tell if they're operating on some kind of deep insight that is way above my experience and I just don't understand it, or if they just made really bad decisions. I just don't get it, it feels so wrong.

    • > I can't tell if they're operating on some kind of deep insight that is way above my experience and I just don't understand it

      This is answered at the very top of the link on the post you replied to. In no unclear language, no less. Direct link here: https://github.com/prisma/prisma/discussions/19748#discussio...

      > I want to elaborate a bit on the tradeoffs of this decision. The reason Prisma uses this strategy is because in a lot of real-world applications with large datasets, DB-level JOINs can become quite expensive...

      > The total cost of executing a complex join is often higher than executing multiple simpler queries. This is why the Prisma query engine currently defaults to multiple simple queries in order to optimise overall throughput of the system.

      > But Prisma is designed towards generalized best practices, and in the "real world" with huge tables and hundreds of fields, single queries are not the best approach...

      > All that being said, there are of course scenarios where JOINs are a lot more performance than sending individual queries. We know this and that's why we are currently working on enabling JOINs in Prisma Client queries as well You can follow the development on the roadmap.

      Though this isn't a complete answer still. Part of it is that Prisma was, at its start, a GraphQL-centric ORM. This comes with its own performance pitfalls, and decomposing joins into separate subqueries with aggregation helped avoid them.

      4 replies →

    • It really gives me flashbacks to the early days of mongodb.

      Which, frankly, is a good lesson that marketing and docs and hype can make up for any amount of technical failure, and if you live long enough, you can fix the tech issues.

    • > if they're operating on some kind of deep insight

      If one's getting OOM errors from a SELECT DISTINCT, then there's no deep insight behind the choice, it's just a mistake.

  • Tbh, I once dabbled in building an ORM myself (in PHP) and I did find that in some situations it was faster to do individual queries and then join in code, to solve the N+1 problem.

    Granted I was much worse in my sql knowledge and postgre/mysql had severe limitations in their query planners, so I can see how something like this could have happened. If they support multiple dbs, and even one has this problem, it might be better (for them) to do it application side.

    The specific issue was doing a join with a table for a one to many, you get a lot more data from the db than you would normally need, if you do the join the naive way, and if the join is nested you get exponentially more data.

    It was faster to do a query for each db separately and then stitch the results.

    Now it is easy to solve in pg with nested selects and json aggregation, which pg query planner rewrites to efficient joins, but you still get only the bytes you have requested without duplication.

> I can't stand ORMs, I don't get why people use it, please just write the SQL.

I used to agree until I started using a good ORM. Entity Framework on .NET is amazing.

  • > Entity Framework on .NET is amazing.

    I disagree. It is probably one of the less terrible ORMs, but it is far from amazing. The object-relational impedance mismatch will always dominate for anything that isn't trivial business. EF works great until you need different views of the model. It does support some kind of view mapping technique, but it's so much boilerplate I fail to see the point.

    Dapper + SqlConnection is goldilocks once you get into the nasty edges. Being able to query a result set that always exactly matches your view models is pretty amazing. The idea of the program automagically upgrading and migrating the schemas is something that was interesting to me until I saw what you could accomplish with Visual Studio's SQL Compare tool & RedGate's equivalent. I feel a lot more comfortable running manual schema migrations when working with hosted SQL providers.

    • > It does support some kind of view mapping technique

      Can you call .Select(entity => SomeSmallerModel() { Name = entity.Name }) or something like that to select what you need? If I am understanding your issue correctly.

      I also agree that its one of the least worst but there are still things that annoy me.

    • > I feel a lot more comfortable running manual schema migrations

      Me too. I use a DB-first approach. Then EF simply rebuilds the application models automatically with a single command.

  • Doesn’t entity framework have a huge memory footprint too?

    • If you don't do stupid things like requesting everything from the database and then filtering data client side, then no.

      We have one application built on .NET 8 (and contemporary EF) with about 2000 tables, and its memory usage is okay. The one problem it has is startup time: EF takes about a minute of 100% CPU load to initialize on every application restart, before it passes execution to the rest of your program. Maybe it is solvable, maybe not, I haven't yet had the time to look into it.

      2 replies →

Not 100% parallel, but I was debugging a slow endpoint earlier today in our app which uses Mongo/mongoose.

I removed a $lookup (the mongodb JOIN equivalent) and replaced it with, as Prisma does, two table lookups and an in-memory join

p90 response times dropped from 35 seconds to 1.2 seconds

  • > I removed a $lookup (the mongodb JOIN equivalent)

    There is no "MongoDB JOIN equivalent" because MongoDB is not a relationalal database.

    It's like calling "retrieve table results sequentially using previous table's result-set" a JOIN; it's not.

Can't speak about Prisma (or Postgres much).

But I've found with that you can get better performance in _few_ situations with application level joins than SQL joins when the SQL join is causing a table lock and therefore rather than slower parallel application joins you have sequential MySQL joins. (The lock also prevents other parallel DB queries which is generally the bigger deal than if this endpoint is faster or not).

Although I do reach for the SQL join first but if something is slow then metrics and optimization is necessary.

It is. But wait... it doesn't join the data on the application level of your application. You have to deploy their proxy service which joins the data on the application level.

  • It's pretty obvious when somebody has only heard of Prisma, but never used it.

    - Using `JOIN`s (with correlated subqueries and JSON) has been around for a while now via a `relationLoadStrategy` setting.

    - Prisma has a Rust service that does query execution & result aggregation, but this is automatically managed behind the scenes. All you do is run `npx prisma generate` and then run your application.

    - They are in the process of removing the Rust layer.

    The JOIN setting and the removing of the middleware service are going to be defaults soon, they're just in preview.

    • They've been saying that for 3 years. We actually had a discount for being an early adopter. But hey its obvious Ive never used it and only heard of it.

      5 replies →