Comment by vintagedave

11 hours ago

This is interesting - I have been working on the same thing, building contextual data, LSP-style.

I saw the tools page where if I understand right, `get-symbol-context` is actually the main useful tool for what you provide? The others seem more metadata it's easy to get already (?) but that tool provides the extra info.

I had been working on exposing mine as more high-level, ie multiple APIs to query different kinds of metadata about symbols, types, etc. But I am still not sure of the best approach, where my thinking was about not overloading the AI with too many different tools. They accumulate quickly.

3 comments

vintagedave

lukeundtrug 10 hours ago

I definitely share the same sentiment. I don’t want to overload the llm with many tools. Better to have a few opinionated and flexible ones, but yeah, keeping the balance is hard.

I would say the main two tools are get-symbol-context and get-repository-overview. The latter is actually the more complex and sophisticated one. I’m running some graph algorithms to rank the symbols in terms of relative importance based on centrality metrics, I.e. how well connected they are in the symbol graph.

The idea behind that is to allow the llm to infer the general structure and architecture of the project with just one tool call.

I guess you could reach a similar thing if you had some good Agents.md or docs detailing that for your project, but this was more meant to reach that on the fly.

The symbol-context tool is basically a graph query tool (without a dsl or cipher support yet), but yeah here the question is also whether it makes more sense to give the ai the possibility to run cipher queries itself or abstract it away in a thinner api.

The main underlying factor of my tool is however the graph that I’m building and the metadata which can be extracted from that (connections, type of connection, etc. ) :)

Whats the metadata you have in mind?

vintagedave 4 hours ago
Metadata: I feel like LSP focuses on human-style things (like locating a symbol) which are useful, but not necessarily exactly what a LLM needs. Instead I want to do things like show the inheritance chain. Is a virtual method overriding something, being overridden later? What is the class / polymorphic situation? My feeling is that this will help understand the shape, plus, help some bugs.
So a query on a symbol would:
* Return its type declaration, not (just) location (and I'm considering some kind of summary version where it pulls in the ancestors too, so you directly see everything it has available not just the actual declaration, because leaf nodes in inheritance often don't add much and the key behaviour is elsewhere)
* Return info about inheritance, the shape of how this modifies other code and other code modifies it.
With variations when the symbol is a variable, a type, etc etc. I'm currently using treesitter for this, to bypass LSP and (for the language I'm working on) build a full symbol table and more, to get something closer to the LSP info you mention in your blog but not limited to what LSP makes available. I don't want to rely on a LSP server; I think first-class support per language is better. It's probably possible to generate this with a set of LSP calls, perhaps, but it might take some heuristics and guesswork and... :/
I do have a graph of file-level dependencies, but not yet a graph of what calls what at the symbol or type or method level. And while I build an index of all symbols I haven't yet sorted that by count.
I get the sense we're thinking along similar lines, with slightly different approaches?
Edit: if you would like to chat on this, I'm up for it! You can find me at my username at gmail (easy to lose emails there due to volume and spam!) or my profile has my website which has my LinkedIn (horribly, more reliable :D)
- lukeundtrug 8 minutes ago
  
  That sounds great, thanks for sharing your thoughts!
  It sure sounds like we have similar things in mind. I basically try to build the proper graph representation of the code during runtime, so all caller/callee relationships plus type inheritance chains etc. This is basically what I call a semantic code graph in the blog post.
  From the things I tried with tree-sitter I think I would have a hard time achieving the same because by nature tree-sitter can only make educated guesses on real connections and will run into problems, if things are named ambiguously.
  But yeah, will definitely reach out and am looking forward to chatting :) Hope I find the time during this week!