Comment by skydhash
21 hours ago
Please tell me which one of the headings is not about increased usage o LLMs and derived tools and is about some improvement in the axes of reliability or or any kind of usefulness.
Here is the changelog for OpenBSD 7.8:
https://www.openbsd.org/78.html
There's nothing here that says: We make it easier to use it more of it. It's about using it better and fixing underlying problems.
The coding agent heading. Claude Code and tools like it represent a huge improvement in what you can usefully get done with LLMs.
Mistakes and hallucinations matter a whole lot less if a reasoning LLM can try the code, see that it doesn't work and fix the problem.
If it actually does that without an argument. I can't believe I have to say that about a computer program
> The coding agent heading. Claude Code and tools like it represent a huge improvement in what you can usefully get done with LLMs.
Does it? It's all prompt manipulation. Shell script are powerful yes, but not really huge improvement over having a shell (REPL interface) to the system. And even then a lot of programs just use syscalls or wrapper libraries.
> can try the code, see that it doesn't work and fix the problem.
Can you really say that does happens reliably?
Depends on what you mean by "reliably".
If you mean 100% correct all of the time then no.
If you mean correct often enough that you can expect it to be a productive assistant that helps solve all sorts of problems faster than you could solve them without it, and which makes mistakes infrequently enough that you waste less time fixing them than you would doing everything by yourself then yes, it's plenty reliable enough now.
You're welcome to try the LLM's yourself and come up with your own conclusions. By what you've posted it doesn't look like you've tried the anything in the last 2 years. Yes LLM's can be annoying, but there has been progress.
I know it seems like forever ago, but claude code only came out in 2025.
Its very difficult to argue the point that claude code:
1) was a paradigm shift in terms of functionality, despite, to be fair, at best, incremental improvements in the underlying models.
2) The results are an order of magnitude, I estimate, better in terms of output.
I think its very fair to distill “AI progress 2025” to: you can get better results (up to a point; better than raw output anyway; scaling to multiple agents has not worked) without better models with clever tools and loops. (…and video/image slop infests everything :p).
Did more software ship in 2025 than in 2024? I'm still looking for some actual indication of output here. I get that people feel more productive but the actual metrics don't seem to agree.
I'm still waiting for the Linux drivers to be written because of all the 20x improvements that AI hypers are touting. I would even settle for Apple M3 and M4 computers to be supported by Asahi.
I am not making any argument about productivity about using AI vs. not using AI.
My point is purely that, compared to 2024, the quality of the code produced by LLM inference agent systems is better.
To say that 2025 was a nothing burger is objectively incorrect.
Will it scale? Is it good enough to use professionally? Is this like self driving cars where the best they ever get is stuck with an odd shaped traffic cone? Is it actually more productive?
Who knows?
Im just saying… LLM coding in 2024 sucked. 2025 was a big year.