← Back to context

Comment by hahahahhaah

1 month ago

An LSP MCP?

Yeah, or something even smarter than that.

If you are willing to go language-specific, the tooling can be incredibly rich if you go through the effort. I’ve written some rust compiler drivers for domain-specific use cases, and you can hook into phases of the compiler where you have amazingly detailed context about every symbol in the code. All manner of type metadata, locations where values are dropped, everything is annotated with spans of source locations too. It seems like a worthy effort to index all of it and make it available behind a standard query interface the LLM can use. You can even write code this way, I think rustfmt hooks into the same pipeline to produce formatted code.

I’ve always wished there were richer tools available to do what my IDE already does, but without needing to use the UI. Make it a standard API or even just CLI, and free it from the dependency on my IDE. It’d be very worth looking into I think.

  • If the compiler just dumped all that data out as structured text, you could use current LLMs to swallow it in a single gulp.

    • Well the point is to avoid them needing to swallow it in a single gulp… after all, the source code is already all the information you need to get all this metadata.

      The use cases I have in mind are for codebases with many millions of lines of code, where just dumping it all into the context is unreasonably expensive. In these scenarios, it’d be beneficial to give the LLM a sort of SQL-like language it can use to prod at the code base in small chunks.

      In fact I keep thinking of SQL as an example in my head, but maybe it’s best to take it literally: why don’t we have a SQL for source code? Why can’t I do “select function.name from functions where parameters contains …” or similar (with clever subselects, joins, etc) to get back whatever exists in the code?

      It’s something I always wanted in general, not just for LLM’s. But LLM’s could make excellent use of it if there’s simply not enough context size to reasonably slurp up all the code.