Comment by parliament32
9 hours ago
When I look at LLMs as an interface, I'm reminded of back when speech-to-text first became mainstream. So many promises about how this is the interface for how we'll talk to computers forevermore.
Here we are a few decades later, and we don't see business units using Word's built-in dictation feature to write documents, right? Funny how that tech seems to have barely improved in all that time. And, despite dictation being far faster than typing, it's not used all that often because.. the error rate is still too high for it to be useful, because errors in speech-to-text are fundamentally an unsolvable problem (you can only get so far with background noise filtering and accounting for accents etc).
I see the parallel in how LLM hallucinations are fundamentally an unsolvable component of transformers-based models, and I suspect LLM usage in 20 years will be around the level of speech-to-text today: ubiquitously in the background, you use it here and there to set a timer or talk to a device, but ultimately not useful for any serious work.
This is a funny point that you're making (for me, anyway), because prior to early December, probably 5% of the lines of code I wrote in a week were AI-generated by cursor. Then I started using Claude Code. Fast forward to today, I would say 98% of the code that I've shipped in the last three weeks has been written completely by Claude Code.
Prior to three weeks ago, I had used speech-to-text to do accomplish approximately 0% of the work I've done in my 20 years of coding. In the last three weeks, well over half of the direction that I've given to Claude Code has been done with speech-to-text.
I think there is a second reason people still type, and it's relevant to LLMs. Typing forces you to slow down and choose your words. When you want to edit, you are already typing, so it doesn't break the flow. In short, it has a fit to the work that speech-to-text doesn't.
LLMs create a new workflow wherever they are employed. Even if capable, that is not always a more desirable/efficient experience.
Yeah this is exactly my view. We've had several years of work on the tech, and LLMs are just as prone to randomly spitting out garbage as they were the first day. They are not a tool which is fit for any serious work, because you need to be able to rely on your tools. A tool which is sometimes good and sometimes bad is worse than having no tool at all.
Did google not rely on Gemini to do their ISA changeover?
https://arxiv.org/abs/2510.14928
Was Gemini worse than no tool at all there?
Probably. According to the paper, 83.82% of automated commits were already made by algorithmic tools (non-LLM). For the remainder, a three-phase LLM approach was tried, and achieved a success rate of 30%. Based on these numbers, it probably would have been faster, cheaper, and more efficient to just enhance their current strategy rather than screwing around with text generators.
Do you really think that Opus 4.6 hallucinates to exactly the same degree as GPT-3.5? I am mystified how you can hold this perspective.
If you're not seeing the hallucinations, I'd assert you're either not using it enough, or (more likely) you don't have enough knowledge in the subject matter to notice when it's hallucinating.
4 replies →
I type faster than I think, and being able to edit gives the edge over text to speech. I don't believe this is a fundamentally comparable analogy.
I'd say speech to text is unsolvable for a more fundamental reason that it's hard to actually speak out an entire document flawlessly in one take.
Spoken language is very different to written language, which is why for example you can easily tell when an article is transcribing a spoken interview.
Yes, it's a UX thing. You'd still have to edit it by typing afterwards as well.
Similarly, raw LLM/chat interfaces are usually not the best option.
The completely different way people are experiencing AI is fascinating.
In my world AI is already far more influential than text to speech.
People on here act like we don’t know if AI will be useful. And I’m sitting over here puzzled because of how fucking useful it is.
Very strange.
> People on here act like we don’t know if AI will be useful. And I’m sitting over here puzzled because of how fucking useful it is.
Yes, it's very strange to read AI threads here because the general tone is so different than, say, at the company I work at, where hundreds of engineers are given enormous monthly token budgets and are being pushed to have the LLMs write as much code as possible. They're not forced to, and no one is reprimanded for not adopting Claude Code or Codex or Cursor. But there's been a strong tonal shift in technology leadership in the last month that basically implies that this is how it is going to be done in the future whether one likes it or not.
As for me, I've been writing all of my code via Claude for a while now, and I don't think I will ever go back to working in an editor writing code the way I did for most of my career. Nor do I want to.
I'm curious about the statement that hallucinations are "fundamentally unsolvable". I don't think an AI agent has left a hallucination in my code - by which I mean a reference to something which doesn't exist at all - in many months. I have had great luck driving hallucinations to effectively 0% by using a language with static typechecking, telling LLMs to iterate on type errors until there are none left, and of course having a robust unit and e2e test suite. I mean, sure, I run into other problems -- it does make logic errors at some rate, but those I would hardly categorize the same as hallucinations.
So type errors are not hallucinations in your book, but "a reference to something which doesn't exist at all" is?
In the context of AI most people I know tend to mean wrong output, not just hallucinations in the literal sense of the word or things you cannot catch in an automated way.
My statement is that if your only hallucinations are type errors, that can be solved by simply wrapping the LLM in a harness that says "Please continue working until `yarn run tsc` is clean". Yes, the LLM still hallucinates, but it doesn't affect me, because by the time I see the code, the hallucinations are gone.
This is something I do every day; to be quite honest, it's a fairly mundane use of AI and I don't understand why it's controversial. To give context, I've probably generated somewhere on the order of 100k loc of AI generated code and I can't remember the last time I have seen a hallucination.
2 replies →
Maybe you're lucky. I had Opus 4.6 hallucinate a non-existing configuration key in a well known framework literally a few hours ago.
Granted, it fixed the problem in the very next prompt.
Couldn’t that problem be solved with static typechecking?
1 reply →
ChatGPT 5.2 kept gaslighting me yesterday telling me that LLMs were explainable with Shapley values, and it kept referencing papers which talk about LLMs, and about SHAP, but talk about LLMs being used to explain the SHAP values of other ML models.
I encounter stuff like this every week, I don't know how you don't. I suppose a well-structured codebase in a statically typed language might not provide as much of a surface for hallucinations to present themselves? But like you say, logical problems of course still occur.
I mean to say that code generation never hallucinates. I suppose that was unclear.
>> I don't think an AI agent has left a hallucination in my code
I literally just went on Gemini, latest and best model and asked it "hey can you give me the best prices for 12TB hard drives available with the British retailer CeX?" and it went "sure, I just checked their live stock and here they are:". Every single one was made up. I pointed it out, it said sorry, I just checked again, here they are, definitely 100% correct now. Again, all of them were made up. This repeated a few times, I accused it of lying, then it went "you're right, I don't actually have the ability to check, so I just used products and values closest to what they should have in stock".
So yeah, hallucinations are still very much there and still very much feeding people garbage.
Not to mention I'm a part of multiple FB groups for car enthusiasts and the amount of AI misinformation that we have to correct daily is just staggering. I'm not talking political stuff - just people copy pasting responses from AI which confidently state that feature X exists or works in a certain way, where in reality it has never existed at all.
My comment was about code, not fact checking - that’s why I said they were a solved problem provided you use static typechecking and tests.