Comment by tomaytotomato

1 month ago

I have been trying to use Claude code to help improve my opensource Java NLP location library.

However trying to get it to do anything other than optimise code or fix small issues it struggles. It struggles with high level abstract issues.

For example I currently have an issue with ambiguity collisions e.g.

Input: "California"

Output: "California, Missouri"

California is a state but also city in Missouri - https://github.com/tomaytotomato/location4j/issues/44

I asked Claude several times to resolve this ambiguity and it suggested various prioritisation strategies etc. however the resulting changes broke other functionality in my library.

In the end I am redesigning my library from scratch with minimal AI input. Why? because I started the project without the help of AI a few years back, I designed it to solve a problem but that problem and nuanced programming decisions seem to not be respected by LLMs (LLMs dont care about the story, they just care about the current state of the code)

30 comments

tomaytotomato

Cthulhu_ 1 month ago

> I started the project in my brain and it has many flaws and nuances which I think LLMs are struggling to respect.

The project, or your brain? I think this is what a lot of LLM coders run into - they have a lot of intrinsic knowledge that is difficult or takes a lot of time and effort to put into words and describe. Vibes, if you will, like "I can't explain it but this code looks wrong"

tomaytotomato 1 month ago
I updated my original comment to explain my reasoning a bit more clearly.
Essentially I ask an LLM to look at a project and it just sees the current state of the codebase, it doesn't see the iterations and hacks and refactors and reverts.
It also doesn't see the first functionality I wrote for it at v1.
This could indeed be solved by giving the LLM a git log and telling it a story, but that might not solve my issue?
- michaelbuckbee 1 month ago
  
  I'm now letting Claude Code write commits + PRs (for my solo dev stuff), but the benefits have been pretty immense as it's basically Claude keeping a history of it's work that can then be referenced at any time that's also outside the code context window.
  FWIW - it works a lot better to have it interact via the CLI than the MCP.
- alright2565 1 month ago
  
  I personally don't have any trouble with that. Using Sonnet 3.7 in Claude Code, I just ask it to spelunk the git history for a certain segment of the code if I think it will be meaningful for its task.
  
  5 replies →
cpursley 1 month ago

Yes, a lot of coders are terrible at documentation (both doc files and code docs) as well as good test coverage. Software should not need to live in ones head after written, it should be well architected and self-documenting - and when it is, both humans and LLMs navigate it pretty well (when augmented with good context management, helper mcps, etc).
nevi-me 1 month ago
I've been a skeptic, but now that I'm getting into using LLMs, I'm finding being very descriptive and laying down my thoughts, preferences, assumptions, etc, to help greatly.
I suppose a year ago we were talking about prompt engineers, so it's partly about being good at describing problems.
- faxmeyourcode 1 month ago
  
  One trick to get out of this scenario where you're writing a ton is to ask the model to interview until we're in alignment on what is being built. Claude and open code both have an AskUserQuestionTool which is really nice for this and cuts down on explanation a lot. It becomes an iterative interview and clarifies my thinking significantly.

epolanski 1 month ago

One major part of successful LLM-assisted coding is to not focus on code vomiting but scaffolding.

Document, document, document: your architecture, best practices, preferences (both about code and how you want to work with the LLM and how do you expect it to behave it).

It is time consuming, but it's the only way you can get it to assist you semi-successfully.

Also try to understand that LLM's biggest power for a developer is not in authoring code as much as assistance into understanding it, connecting dots across features, etc.

If your expectation is to launch it in a project and tell it "do X, do Y" without the very much needed scaffolding you'll very quickly start losing the plot and increasing the mess. Sure, it may complete tasks here and there, but at the price of increasing complexity from which it is difficult for both you and it to dig out.

Most AI naysayers can't be bothered with the huge amount of work required to setup a project to be llm-friendly, they fail, and blame the tool.

Even after the scaffolding, the best thing to do, at least for the projects you care (essentially anything that's not a prototype for quickly validating an idea) you should keep reading and following it line by line, and keep updating your scaffolding and documentation as you see it commit the same mistakes over and over. And part of scaffolding requires also to put the source code of your main dependencies. I have a _vendor directory with git subtrees for major dependencies. LLMs can check the code of the dependencies, the tests, and figure out what they are doing wrong much quicker.

Last but not least, LLMs work better with certain patterns, such as TDD. So instead of "implement X", it's better to "I need to implement X, but before we do so, let's setup a way for testing and tracking our progress against". You can build an inspector for a virtual machine, you can setup e2es or other tests, or just dump line by line logs in some file. There's many approaches depending on the use case.

In any case, getting real help for LLMs for authoring code (editing, patching, writing new features) is highly dependent on having good context, good setup (tests, making it write a plan for business requirements and one for implementation) and following and improving all these aspects as you progress.

tomaytotomato 1 month ago
I agree to an extent
My project is quite well documented and I created a Prompt a while back along with some mermaid diagrams
https://github.com/tomaytotomato/location4j/tree/master/docs
I can't remember the exact prompt I gave to the LLM but I gave it a Github issue ticket and description.
After several iterations it fixed the issue, but my unit tests failed in other areas. I decided to abort it because I think my opinionated code was clashing with the LLM's solution.
The LLM's solution would probably be more technically correct, but because I don't do l33tcode or memorise how to implement Trie or BST my code does it my way. Maybe I just need to force the LLM to do it my way and ignore the other solutions?
- theshrike79 1 month ago
  
  Just looking at PROMPT.md:
  The role thing doesn't work anymore ("You are a..."), it's just fanfiction.
  Also adding MUST NOT just pollutes the context, it's the "don't think of a pink elephant" but for LLMs
  It's enough to say "Use Java 21+", no need to add a negative of "don't use below 21"
  And this is just weird: "MUST NOT Change the license from MIT" - did an agent change the license in your code or what prompted you to add this?

eichin 1 month ago

Trying not to turn this into "falsehoods developers believe about geographic names", but having done natural-language geocoding at scale (MetaCarta 2002-2010, acquired by Nokia) the most valuable thing was a growing set of tagged training data - because we were actually building the models out of that, but also because it would detect regressions; I suspect you needed something similar to "keep the LLMs in line", but you also need it for any more artisinal development approach too. (I'm a little surprised you even have a single-value-return search() function, issue#44 is just the tip of the iceberg - https://londonist.com/london/features/places-named-london-th... is a pretty good hint that a range of answers with probabilities attached is a minimum starting point...)

tomaytotomato 1 month ago

Thanks for this - its interesting how I have come to this conclusion as well.
My reworked approach is to return a list of results with a probability or certainty score.
In the situation of someone searching for London, I need to add some sort of priority for London, UK.
My dataset is sourced from an opensource JSON file which I am now pre-processing and identifying all collisions on it.
There are so many collisions!
Could I pick your brains and you could critique my approach? Thanks

skybrian 1 month ago

I find that asking it to write a design doc first and reviewing that (both you and the bot can do reviews) gets better results.

softwaredoug 1 month ago

Sounds a lot like model training and I’ve treated this sort of programming with AI exactly like that importantly making sure I have a test/train split

Make sure there’s a holdout the agent can’t see that it’s measured against. (And make sure it doesn’t cheat)

https://softwaredoug.com/blog/2026/01/17/ai-coding-needs-tes...

faxmeyourcode 1 month ago

> LLMs dont care about the story, they just care about the current state of the code

You have to tell it about the backstory. It does not know unless you write about it somewhere and give it as input to the model.

krona 1 month ago

The commit history of that repo is pretty detailed at first glance.

krona 1 month ago

If Claude read the entire commit history, wouldn't that allow it to make choices less incongruent with the direction of the project and general way of things?

px43 1 month ago

> it struggles

It does not struggle, you struggle. It is a tool you are using, and it is doing exactly what you're telling it to do. Tools take time to learn, and that's fine. Blaming the tools is counterproductive.

If the code is well documented, at a high level and with inline comments, and if your instructions are clear, it'll figure it out. If it makes a mistake, it's up to you to figure out where the communication broke down and figure out how to communicate more clearly and consistently.

smrq 1 month ago

"My Toyota Corolla struggles to drive up icy hills." "It doesn't struggle, you struggle." ???
It's fine to critique your own tools and their strengths and weaknesses. Claiming that any and all failures of AI are an operator skill issue is counterproductive.
zeroCalories 1 month ago
Not all tools are right for all jobs. My spoon struggles to perform open heart surgery.
- rtp4me 1 month ago
  
  But as a heart surgeon, why would you ever consider using a spoon for the job? AI/LLMs are just a tool. Your professional experience should tell you if it is the right tool. This is where industry experience comes in.
  
  1 reply →
vrighter 1 month ago

A tool is something I can tightly control. A thing that may or may not work today, and if it does, might stop working tomorrow when the model gets updated without any notification to anyone, the output of which I have to very carefully scrutinize anyway, is not a tool. It's a toy.
whateveracct 1 month ago

This sounds like coding with plaintext with extra steps.