← Back to context

Comment by tokyolights2

16 days ago

Tangentially related: for those of you using AI tools more than I am, how do LLMs handle things like API updates? I assume the Python2/3 transition was far enough in the past that there aren't too many issues. How about other libraries that have received major updates in the last year?

Maybe a secret positive outcome of using automation to write code is that library maintainers have a new pressure to stop releasing totally incompatible versions every few years (looking at Angular, React...)

Horribly. In my experience when dealing with "unstable" or rapidly evolving APIs/designs like IaC with OpenTofu you need MCP connected to tf provider documentation (or just example/markdown files, whichever you like most) for LLMs to actually work correctly.

> how do LLMs handle things like API updates?

Quite badly. Can't tell you how many times an LLM has suggested WORKSPACE solutions to my Bazel problems, even when I explicitly tell them that I'm using Bzlmod.

> for those of you using AI tools more than I am, how do LLMs handle things like API updates?

From recent experience, 95% of changes are good and are done in 15 minutes.

5% of changes are made, but break things because the API might have documentation, but your code probably doesn't document "Why I use this here" and instead has "What I do here" in bits.

In hindsight it was an overall positive experience, but if you'd asked me at the end of the first day, I'd have been very annoyed.

I thought this would take me from Mon-Fri if I was asked to estimate, but it took me till Wed afternoon.

But half a day in I thought I was 95% done, but then it took me 2+ more days to close that 5% of hidden issues.

And that's because the test-suite was catching enough class of issues to go find them everywhere.

With Dart/Flutter, it's often recommending deprecated code and practice.

Deprecated code is quickly identified by VSCode (like Text.textScaleFactor) but not the new way of separating items in a column/row by using the "Spacing" parameters (instead of manually adding a SizedBox between every items).

Coding with an LLM is like coding with a Senior Dev who doesn't follow the latest trends. It works, has insights and experience that you don't always have, but sometimes it might code a full quicksort instead of just calling list.sort().

A good fraction of my CLAUDE.md lines is along the lines of "use X, not deprecated Y." The training input has more instances of the old API use, and they all keep popping up repeatedly, even with those instructions.

If you think the correct API is not going to be in its weights (or if there are different versions in current use), you ask nicely for it to "please look at the latest API documentation before answering".

Sometimes it ignores you but it works more often than not.

LLMs fall short on most edge cases

  • which would be explained that those contribute very little to weighting, and so like extrapolating beyond the last end-point, errors accumulate significantly.