Comment by lazide

3 years ago

Every database I know of can generate query logs. Why not just log every query and do some statistical analysis on it?

> Every database I know of can generate query logs.

Every one I know of warns that it comes with significant performance implications and isn't intended to be used in production.

> Why not just log every query and do some statistical analysis on it?

And then what? If you know this table only gets queried a few times a month, what does that actually tell you that you can use?

  • It's resource intensive - but so is being in a giant tarpit/morass. Adding client query logging is cheaper and can be distributed. I just double checked, and neither Oracle nor Postgres warn 'never use it in production'

    And if you have logs, you can see what actually gets queried, and by whom, and what doesn't get queried, and by whom.

    That will also potentially let you start constructing views and moving actual underlying tables out of the way to where you can control them.

    Which can let you untangle the giant spaghetti mess you're in.

    But then, that's just me having actually done that a few times. You're welcome to complain about how it's actually unsolvable and will never get better, of course.

    • > It's resource intensive - but so is being in a giant tarpit/morass.

      Agreed, but it means it's not really a viable option for digging yourself out of that hole if you're already in it. Most of the time if you're desperately trying to split up your database it's because you're already hitting performance issues.

      > Adding client query logging is cheaper and can be distributed.

      Right, but that only works if you've got a good handle on what all your clients are. If you've got a random critical script that you don't know about, client logging isn't going to catch that one's queries.

      > But then, that's just me having actually done that a few times. You're welcome to complain about how it's actually unsolvable and will never get better, of course.

      I've done it a few times too, it's always been a shitshow. Query logging is a useful tool to have in some cases but it's often not an option, and even when it is not a quick or easy fix. You're far better off not getting into that situation in the first place, by enforcing proper datastore ownership and scalable data models from the start, or at least from well before you start hitting the performance limits of your datastores.

      2 replies →