Comment by abathur
2 years ago
I generally like the new code search, but I've got one big gripe: there's no way to sort code results by any kind of proxy for recency.
The old code search had the ability to sort by indexed date. This wasn't perfect, but it was something.
I like keeping up with who's using my code and whether they're leaving comments or commit chains that outline trouble they're having with it. Sometimes old code pops up in the recently-indexed sort, but if I regularly search and look at the top page, I can see most new uses.
Without it, code search is basically useless for this purpose :/
(I work on code search.) Yeah, sorry about that. We've heard this feedback a lot. There's two reasons why we haven't implemented this. First, content is shared between repositories which makes this harder than before, when it wasn't. Second, we rebuild the index weekly or even more frequently, so the proxy of "when was this added" that was used doesn't work any more. What we would like to use is "when was this blob added to this branch" but that's extremely expensive to retrieve from Git because Git trees don't record it.
git blame is expensive, especially at scale, on big repos, but the thing to understand is the exact hash something was commmitted at isn't important for time-based indexing. What's wanted is when, ± a few days, something was committed at, which makes for a much cheaper query. (How merges are dealt with might also be material)
Barring that though, the equivalent of a post-commit git hook that updates the DB with 'when this blob was added to this branch' and then run a backfill-enough job.
The easy answer, though, would seem to be keep a copy of last week's index, and run the query twice and figure out a way to efficiently compare results to figure out if something is this week's but not last weeks index.
Also of note, "when was this blob added to this branch" isn't even actually the same as git blame, which means that if a file was touched that matched the search but the latest change to the file doesn't affect the matching line of code, it'd show up as recent, which is not what the user wants.
Does this mean it will not be implemented?
We want to do it right if we do implement it, but I can't promise anything concretely. It's not trivial, unfortunately.
2 replies →