← Back to context

Comment by anotherpaulg

2 years ago

By feeding the LLM the AST representation of the code, the tree of workspace files, public items, module hierarchy alongside with the code, it could be a significant improvement.

Aider does this, using tree-sitter to build a “repository map”. This helps the LLM understand the overall code base and how it relates to the specific coding task at hand.

https://aider.chat/docs/repomap.html

More broadly, I agree with your sentiment that there is a lot of value in considering the best ways to structure the data we share with LLMs. Especially in the context of coding.

>Aider does this, using tree-sitter to build a “repository map”. This helps the LLM understand the overall code base and how it relates to the specific coding task at hand.

Great stuff.

>More broadly, I agree with your sentiment that there is a lot of value in considering the best ways to structure the data we share with LLMs. Especially in the context of coding.

As the experiments on PHI-1 and PHI-2 from microsoft show, training data make a difference. The "textbooks is all you need" moto means better structured data, more clear data make a difference.

https://arxiv.org/abs/2306.11644