Comment by cogman10

3 years ago

> fwiw hacking hundreds of apps literally making them worse by fragmenting their source of record doesn't sound like a good plan. it's no surprise you have saboteurs, your company probably wants to survive and your plan is to shatter its brain.

The brain is already shattered. This wouldn't "literally make them worse", instead it would say that "now instead of everyone in the world hitting the users table directly and adding or removing data from that table, we have one service in charge of managing users".

Far too often we have queries like

SELECT b.*, u.username FROM Bar b JOIN users u ON b.userId = u.id

And why is this query doing that? To get a human readable username that isn't needed but at one point years ago made it nicer to debug the application.

> you should be trying to debottleneck your sql server if that's the plan the whole org can get behind.

Did you read my post? We absolutely HAVE been working, for years now, at "debottlenecking our sql server". We have a fairly large team of DBAs (about 30) who's whole job is "debottlenecking our sql server". What I'm saying is that we are, and have been, at the edge (and more often than not over the edge) of tipping over. We CAN'T buy our way out of this with new hardware because we already have the best available hardware. We already have read only replicas. We already have tried (and failed at) sharding the data.

The problem is data doesn't have stewards. As a result, we've spent years developing application code where nobody got in the way of saying "Maybe you shouldn't join these two domains together? Maybe there's another way to do this?"

3 comments

cogman10

gfody 3 years ago

assuming some beastly server with terabytes of ram, hundreds of fast cores, and an exotic io subsystem capable of ridiculous amounts of low latency iops, I'd guess the perf issue with that example is not sql server struggling with load but rather lock contention from the users table being heavily updated. unless that beast of a server is sitting pegged with a hardware bottleneck it can probably be debottlenecked by vertically partitioning the users table. ie: split the table into two (or more) to isolate the columns that change frequently from the ones that don't, replace the table with a view that joins it back together w/instead-of triggers conditionally updating the appropriate tables, etc. etc. then when this happens:

    SELECT b.*, u.username FROM Bar b JOIN users u ON b.userId = u.id

sql server sees that you're only selecting username from the users view and eliminates the joins for the more contentious tables and breathes easy peasy

> And why is this query doing that? To get a human readable username that isn't needed but at one point years ago made it nicer to debug the application.

imo users should be able to do this and whatever else they want and it's not even unreasonable to want usernames for debugging purposes forever. I'd expect the db team to support the requirements of the apps teams and wouldn't want to have to get data from different sources

cogman10 3 years ago
> assuming some beastly server with terabytes of ram, hundreds of fast cores, and an exotic io subsystem capable of ridiculous amounts of low latency iops, I'd guess the perf issue with that example is not sql server struggling with load but rather lock contention from the users table being heavily updated.
You'd guess wrong. The example above is not the only query our server runs. It's an example of some of the queries that can be run. We have a VERY complex relationship graph, far more than what you'll typically find. This is finance, after all.
I used the user example for something relatable without getting into the weeds of the domain.
We are particularly read heavy and write light. The issue is quiet literally that we have too many applications doing too many reads. We are literally running into problems where our tempDb can't keep up with the requests because there are too many of them doing too complex of work.
You are assuming we can just partition a table here or there and everything will just work swimmingly, that's simply not the case. Our tables do not so easily partition. (perhaps our users table would, but again, that was for illustrative purposes and by no means the most complex example).
Do you think that such a simple solution hasn't been explored by a team of 50 DBAs? Or that this sort of obvious problem wouldn't have been immediately fixed?
- gfody 3 years ago
  
  > Do you think that such a simple solution hasn't been explored by a team of 50 DBAs? Or that this sort of obvious problem wouldn't have been immediately fixed?
  based on what you've shared, yeah. I also wouldn't expect a million DBAs to replace a single DBE