Comment by zamnos
2 years ago
git blame is expensive, especially at scale, on big repos, but the thing to understand is the exact hash something was commmitted at isn't important for time-based indexing. What's wanted is when, ± a few days, something was committed at, which makes for a much cheaper query. (How merges are dealt with might also be material)
Barring that though, the equivalent of a post-commit git hook that updates the DB with 'when this blob was added to this branch' and then run a backfill-enough job.
The easy answer, though, would seem to be keep a copy of last week's index, and run the query twice and figure out a way to efficiently compare results to figure out if something is this week's but not last weeks index.
Also of note, "when was this blob added to this branch" isn't even actually the same as git blame, which means that if a file was touched that matched the search but the latest change to the file doesn't affect the matching line of code, it'd show up as recent, which is not what the user wants.
No comments yet
Contribute on Hacker News ↗