← Back to context

Comment by lmm

3 years ago

> It's resource intensive - but so is being in a giant tarpit/morass.

Agreed, but it means it's not really a viable option for digging yourself out of that hole if you're already in it. Most of the time if you're desperately trying to split up your database it's because you're already hitting performance issues.

> Adding client query logging is cheaper and can be distributed.

Right, but that only works if you've got a good handle on what all your clients are. If you've got a random critical script that you don't know about, client logging isn't going to catch that one's queries.

> But then, that's just me having actually done that a few times. You're welcome to complain about how it's actually unsolvable and will never get better, of course.

I've done it a few times too, it's always been a shitshow. Query logging is a useful tool to have in some cases but it's often not an option, and even when it is not a quick or easy fix. You're far better off not getting into that situation in the first place, by enforcing proper datastore ownership and scalable data models from the start, or at least from well before you start hitting the performance limits of your datastores.

If you are in the hole where you really cannot add load to your database server but want to log the queries, there is a technique called zero impact monitoring where you literally mirror the network traffic going to your database server, and use a separate server to reconstruct it into query logs. These logs identify the queries that are being run, and critically, who/what is running them.

A past workplace of mine has used this approach.

  • Excellent pointer! Do you happen to know of anything which can do this out of the box?