Comment by ninkendo

1 month ago

Related:

I’ve always found it crazy that my LLM has access to such terrible tools compared to mine.

It’s left with grepping for function signatures, sending diffs for patching, and running `cat` to read all the code at once.

I however, run an IDE and can run a simple refactoring tool to add a parameter to a function, I can “follow symbol” to see where something is defined, I can click and get all usages of a function shown at a glance, etc etc.

Is anyone working on making it so LLM’s get better tools for actually writing/refactoring code? Or is there some “bitter lesson”-like thing that says effort is always better spent just increasing the context size and slurping up all the code at once?

> Claude Code officially added native support for the Language Server Protocol (LSP) in version 2.0.74, released in December 2025.

I think from training it's still biased towards simple tooling.

But also, there is real power to simple tools, a small set of general purpose tools beats a bunch of narrow specific use case tools. It's easier for humans to use high level tools, but for LLM's they can instantly compose the low level tools for their use case and learn to generalize, it's like writing insane perl one liners is second nature for them compared to us.

If you watch the tool calls you'll see they write a ton of one off small python programs to test, validate explore, etc...

If you think about it any time you use a tool there is probably a 20 line python program that is more fit to your use case, it's just that it would take you too long to write it, but for an LLM that's 0.5 seconds

  • > but for LLM's they can instantly compose the low level tools for their use case and learn to generalize

    Hard disagree; this wastes enormous amounts of tokens, and massively pollutes the context window. In addition to being a waste of resources (compute, money, time), this also significantly decreases their output quality. Manually combining painfully rudimentary tools to achieve simple, obvious things -- over and over and over -- is *not* an effective use of a human mind or an expensive LLM.

    Just like humans, LLMs benefit from automating the things they need to do repeatedly so that they can reserve their computational capacity for much more interesting problems.

    I've written[1] custom MCP servers to provide narrowly focused API search and code indexing, build system wrappers that filter all spurious noise and present only the material warnings and errors, "edit file" hooks that speculatively trigger builds before the LLM even has to ask for it, and a litany of other similar tools.

    Due to LLM's annoying tendency to fall back on inefficient shell scripting, I also had to write a full bash syntax parser and shell script rewriting ruleset engine to allow me to silently and trivially rewrite their shell invocations to more optimal forms that use the other tools I've written, so that they don't have to do expensive, wasteful things like pipe build output through `head`/`tail`/`grep`/etc, which results in them invariably missing important information, and either wandering off into the weeds, or -- if they notice -- consuming a huge number of turns (and time) re-running the commands to get what they need.

    Instead, they call build systems directly with arbitrary options, | filters, etc, and magically the command gets rewritten to something that will produce the ideal output they actually need, without eating more context and unnecessary turns.

    LLMs benefit from an IDE just like humans do -- even if an "IDE" for them looks very different. The difference is night and day. They produce vastly better code, faster.

    [1] And by "I've written", I mean I had an LLM do it.

  • Note that the Claude code LSP integration was actually broken for a while after it was released, so make sure you have a very recent version if you want to try it out.

    However as parent comment said, it seems to always grep instead, unless explicitly said to use the LSP tool.

  • Correct. If you try to create a coding agent using the raw Codex or Claude code API and you build your own “write tool”, and don’t give the model their “native patch tool”, 70%+ of the time it’s write/ patch fails because it tries to do the operation using the write/ patch tool it was trained on.

> I however, run an IDE and can run a simple refactoring tool to add a parameter to a function, I can “follow symbol” to see where something is defined, I can click and get all usages of a function shown at a glance, etc etc

I am so surprised that all of the AI tooling mostly revolves around VSC or its forks and that JetBrains seem to not really have done anything revolutionary in the space.

With how good their refactoring and code inspection tools are, you’d really think they’d pass of that context information to AI models and that they’d be leaps and bounds ahead.

  • Recently, all these agents can talk LSP (language server protocol) so it should get better soon. That said, yeah they don't seem to default to use `ripgrep` when that is clearly better than `grep`

  • Are you? I'm not surprised at all, considering that the biggest investment juggernaut in AI is also the author of VSC. I wonder what the connection is? ;)

  • Agreed - this seems like a no brainer, surely this is something that is being worked on.

  • Jetbrains is trying but I feel like they're very very behind in the space

  • Claude and other LLMs can be used through JetBrains, and the IDE provides a significantly better experience than VS Code in my opinion.

  • I haven't seen JetBrains as 'great'. I think they have a strong marketing team that gets into universities and potentially astroturfs on the internet, but I have always found better tools for every language. Although, I can't remember what I ended up choosing for PHP.

LLMs aren't like you or me. They can comprehend large quantities of code quickly and piece things together easily from scattered fragments. so go to reference etc become much less important. Of course though things change as the number of usages of a symbol becomes large but in most cases the LLM can just make perfect sense of things via grep.

To provide it access to refactoring as a tool also risks confusing it via too many tools.

It's the same reason that waffling for a few minutes via speech to text with tangents and corrections and chaos is just about as good as a carefully written prompt for coding agents.

If you can read fast enough, grepping is probably faster than waiting for a compiler to tell you anything.

  • Faster for worse results, though. Determining the source of a symbol is not as trivial as finding the same piece of text somewhere else, it should also reliably be able to differentiate among them. What better source for that then the compiler itself?

    • Yeah, especially for languages that make heavy use of type inference. There’s nothing you can really grep for most of the time… to really know “who’s using this code” you need to know what the compiler knows.

      An LLM can likely approach compiler-level knowledge just from being smart and understanding what it’s reading, but it costs a lot of context to do this. Giving the LLM access to what the compiler knows as an API seems like it’s a huge area for improvement.

    • It depends on the language and codebase. For something very dynamic like Python it may be the case that grepping finds real references to a symbol that won’t be found by a language server. Also language servers may not work with cross-language interfaces or codegen situations as well as grep.

      OTOH for a giant monorepo, grep probably won’t work very well.

Zed Editor gives the LLM tools that use the LSP as you'd expect as a normal IDE user, like "go to symbol definition" so it greps a lot less.

JetBrain IDEs come with an MCP server that supports some refactoring tools [1]:

> Starting with version 2025.2, IntelliJ IDEA comes with an integrated MCP server, allowing external clients such as Claude Desktop, Cursor, Codex, VS Code, and others to access tools provided by the IDE. This provides users with the ability to control and interact with JetBrains IDEs without leaving their application of choice.

[1] https://www.jetbrains.com/help/idea/mcp-server.html#supporte...

Tidewave.ai does exactly that. It’s made Claude code so much more functional. It provides mcp servers to

- search all your code efficiently - search all documentation for libraries - access your database and get real data samples (not just abstract data types) - allows you to select design components from your figma project and implements them for you - allows Claude to see what is rendered in the browser

It’s basically the ide for your LLM client. It really closes the loop and has made Claude and myself so much more productive. Highly recommended and cheap at $10/month

Ps: my personal opinion. I have Zero affiliation with them

LLMs operate on text. They can take in text, and they can produce text. Yes, some LLMs can also read and even produce images, but at least as of today, they are clearly much better at using text[1].

So cat, ripgrep, etc are the right tools for them. They need a command line, not a GUI.

1: Maybe you'd argue that Nano Banana is pretty good. But would you say its prompt adherence is good enough to produce, say, a working Scratch program?

  • Inputs to functions are text, as in variables, or file names, directory names, symbol names with symbol searching. Outputs you get from these functions for things like symbol searching is text too, or at least easily reformatted to text. Like API calls are all just text input and output.

    • Yes, and I frequently see Claude Code start with tools that retrieve these things when it's doing work. What are you surprised it isn't using?

You can give agents the ability to check VSCode Diagnostics, LSP servers and the like.

But they constantly ignore them and use their base CLI tools instead, it drives me batty. No matter what I put in AGENTS.md or similar, they always just ignore the more advanced tooling IME.

  • Doesn't have to be a bad thing, not all languages have good LSP support. If the AI can optimize for simple cross-language tools it won't be as dependent on the LSP implementation.

    I used grep and simple ctags to program in vanilla vim for years. It can be more useful than you'd think. I do like the LSP in Neovim and use it a lot, but I don't need it.

    • I also lived in ctags land, but gosh I don’t miss it. LSPs are a step change, and most languages do have either an actual implementation or something similar enough that’s still more powerful than bare strings.

      It’s faster, too, as the model doesn’t need to scan for info, but again it really likes to try not to use it.

      Of course I still use rg and fd to traverse things, cli tools are powerful. I just wish LLMs could be made to use more powerful tools reliably!

An LSP MCP?

  • Yeah, or something even smarter than that.

    If you are willing to go language-specific, the tooling can be incredibly rich if you go through the effort. I’ve written some rust compiler drivers for domain-specific use cases, and you can hook into phases of the compiler where you have amazingly detailed context about every symbol in the code. All manner of type metadata, locations where values are dropped, everything is annotated with spans of source locations too. It seems like a worthy effort to index all of it and make it available behind a standard query interface the LLM can use. You can even write code this way, I think rustfmt hooks into the same pipeline to produce formatted code.

    I’ve always wished there were richer tools available to do what my IDE already does, but without needing to use the UI. Make it a standard API or even just CLI, and free it from the dependency on my IDE. It’d be very worth looking into I think.

Not coding agents but we do a lot of work trying to find the best tools, and the result is always that the simplest possible general tool that can get the job done always beats a suite of complicated tools and rules on how to use them.

  • Well, jump to definition isn't exactly complicated?

    And you can use whatever interface the language servers already use to expose that functionality to eg vscode?

    • It can be: What definition to jump to if there are multiple (e.g. multiple Translation Units)? What if the function is overloaded and none of the types match?

      With grep it's easy: Always shows everything that matches.

      1 reply →

This isn’t completely the answer to what you want but skills do open a lot of doors here. Anything you can do on a command line can turn into a skill, after all.

I’ve been saying this for a while. CPU demand is about to go through the roof.

I think about it, to get these tools to be most effective you have to be able to page things in and out of their context windows.

What was once a couple of queries is now gonna be dozens or hundreds or even more from the LLM

For code that means querying the AST and query it in a way that allows you to limit the results of the output

I wonder which SAST vendor Anthropic will buy.