Comment by ramon156
12 hours ago
> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”
The most changed file is the one people are afraid of touching?
Just like that place that's so crowded nobody goes there anymore.
I've just tried this, and the most touched files are also the most irrelevant or boring files (auto generated, entry-point of the service etc.) in my tests.
Yeah same thing happens with lockfiles and CI configs. You end up filtering out half the list before it tells you anything useful.
I just tried it too and it basically just flagged a handful of 1500+ line files which probably ought to be broken up eventually but arent causing any serious problems.
If it's (like in my case) dependency management, localization or config files, breaking them up will likely only cause more issues. Make sure that it's an actual improvement before breaking things up.
This command needs a warning. Using this command and drawing too many conclusions from it, especially if you’re new, will make you look stupid in front of your team mates.
I ran this on the repo I have open and after I filtered out the non code files it really can only tell me which features we worked on in the last year. It says more about how we decided to split up the features into increments than anything to do with bugs and “churn”.
Good thing that the article contains that warning, then.
Not really strong enough in a post about what to do in a codebase you’re not familiar with. In that situation you’re probably new to the team and organisation and likely to get off on the wrong foot with people if you assume their code “hurts”.
3 replies →
I found it interesting, that Git itself has built in similarity notion... when it packs objects, it groups files by path+size, runs delta cmpression to find which are close.
Very different from just counting commits - https://vectree.io/c/delta-compression-heuristics-and-packfi...
These commands are just about what files to start looking at to understand new codebase.
Better for people to know they're just blindly copying tools and parroting their output as if it's automatically meaningfully. Any warning against that should be built into the individual, for their own sake
Right? Some of these comments feel “you gave me commands to run and I should be able to turn my brain off to interpret the outputs”. These aren’t newbie commands so the assumption would be that you kinda know what you’re doing at least a little bit. If not, then don’t run them… similar to how you should approach all commands/things from the internet
Plotting Churn against Complexity is far more useful than merely churn.
It shows places that are problematic much better. High churn, low complexity: fine. Its recognized and optimizef that this is worked on a lot (e.g. some mapping file, a dsl, business rules etc). Low churn high complexity: fine too. Its a mess, but no-one has to be there. But both? Thats probably where most bugs originate, where PRs block, where test coverage is poor and where everyone knows time is needed to refactor.
In fact, quite often I found that a teams' call "to rewrite the app from scratch" was really about those few high-churn-high-complexity modules, files or classes.
Complexity is a deep topic, but even simple checks like how nested smt is, or how many statements can do.
Maybe it's a start to find conflict-prone regions ?
otherwise you're right, it could be a long linear list of appends where people are happy to contribute.
Yes. Because the fear is butressed with necessity. You have to edit the file, and so does everyone else and that is a recipe for a lot of mess. I can think back over years of files like this. Usually kilolines of impossible to reason about doeverything.
Definitely not in my experience. The most changed are the change logs, files with version numbers and readmes. I don't think anyone is afraid of keeping those up to date.
pom.xml and package.json came up on couple of separate projects I ran the commands on. Which makes sense because the versions get bumped rather frequently. I guess context matters, as usual.
Yeah, the truth is going to be a lot more subtle than this.
The LLM that wrote the copy is an idiot.
This is such obvious LLM slop.
Could be also that a frequently edited file had most opportunity to be broken. And it was edited by the most random crowd.
In my case, it's .github/CODEOWNERS.
Nobody is afraid of changing it.
Why does github owners need frequent change? Do members in you team change so often?