Comment by deegles
2 days ago
> We're clearly seeing what AI will eventually be able to do
Are we though? Aside from a narrow set of tasks like translation, grammar, and tone-shifting, LLMs are a dead end. Code generation sucks. Agents suck. They still hallucinate. If you wouldn't trust its medical advice without review from an actual doctor, why would you trust its advice on anything else?
Also, the companies trying to "fix" issues with LLMs with more training data will just rediscover the "long-tail" problem... there is an infinite number of new things that need to be put into the dataset, and that's just going to reduce the quality of responses.
For example: the "there are three 'b's in blueberry" problem was caused by so much training data in response to "there are two r's in strawberry". it's a systemic issue. no amount of data will solve it because LLMs will -never- be sentient.
Finally, I'm convinced that any AI company promising they are on the path to General AI should be sued for fraud. LLMs are not it.
I have a feeling that you believe "translation, grammar, and tone-shifting" works but "code generation sucks" for LLMs because you're good at coding and hence you see its flaws, and you're not in the business of doing translation etc.
Pretty sure if you're going to use LLMs for translating anything non-trivial, you'd have to carefully review the outputs, just like if you're using LLMs to write code.
You know, you're right. It -also- sucks at those tasks because on top of the issue you mention, unedited LLM text is identifiable if you get used to its patterns.
Exactly. Books are still being translated by human translators.
I have a text on my computer, the first couple of paragraphs from the Dutch novel "De aanslag", and every few years I feed it to the leading machine translation sites, and invariably, the results are atrocious. Don't get me wrong, the translation is quite understandable, but the text is wooden, and the translation contains 3 or 4 translation blunders.
GPT-5 output for example:
Far, far away in the Second World War, a certain Anton Steenwijk lived with his parents and his brother on the edge of Haarlem. Along a quay, which ran for a hundred meters beside the water and then, with a gentle curve, turned back into an ordinary street, stood four houses not far apart. Each surrounded by a garden, with their small balconies, bay windows, and steep roofs, they had the appearance of villas, although they were more small than large; in the upstairs rooms, all the walls slanted. They stood there with peeling paint and somewhat dilapidated, for even in the thirties little had been done to them. Each bore a respectable, bourgeois name from more carefree days: Welgelegen Buitenrust Nooitgedacht Rustenburg Anton lived in the second house from the left: the one with the thatched roof. It already had that name when his parents rented it shortly before the war; his father had first called it Eleutheria or something like that, but then written in Greek letters. Even before the catastrophe occurred, Anton had not understood the name Buitenrust as the calm of being outside, but rather as something that was outside rest—just as extraordinary does not refer to the ordinary nature of the outside (and still less to living outside in general), but to something that is precisely not ordinary.
Can you provide a reference translation or at least call out the issues you see with this passage? I see "far far away in the [time period]" which I should imagine should be "a long time ago" What are the other issues?
4 replies →
By definition, transformers can never exceed average.
That is the thing, and what companies pushing LLMs don't seem to realize yet.
Can you expand on this? For tasks with verifiable rewards you can improve with rejection sampling and search (i.e. test time compute). For things like creative writing it’s harder.
1 reply →
> Aside from a narrow set of tasks like translation, grammar, and tone-shifting, LLMs are a dead end.
I consider myself an LLM skeptic, but gee saying they are a "dead end" seems harsh.
Before LLM's came along computers understanding human language was graveyard academics when to end their careers in. Now computers are better at it and far faster than most humans.
LLM's also have an extortionary ability to distill and compress knowledge, so much so that you can download a model whose since is measured in GB, and it seems to have a pretty good general knowledge of everything of the internet. Again, far better than any human could do. Yes, the compression is lossy, and yes they consequently spout authoritative sounding bullshit on occasion. But I use them regardless as a sounding board, and I can ask them questions in plain English rather than go on a magical keyword hunt.
Merely being able to understand language or having a good memory is not sufficient to code or do a lot else, on it's own. But they are necessary ingredients for many tasks, and consequently it's hard to imagine a AI that can competently code that doesn't have an LLM as a component.
> it's hard to imagine a AI that can competently code that doesn't have an LLM as a component.
That's just it. LLMs are a component, they generate text or images from a higher-level description but are not themselves "intelligent". If you imagine the language center of your brain being replaced with a tiny LLM powered chip, you would not say it's sentient. it translates your thoughts into words which you then choose to speak or not. That's all modulated by consciousness.
> If you wouldn't trust its medical advice without review from an actual doctor, why would you trust its advice on anything else?
When an LLM gives you medical advice, it's right x% of the time. When a doctor gives you medical advice, it's right y% of the time. During the last few years, x has gone from 0 to wherever it is now, while y has mostly stayed constant. It is not unimaginable to me that x might (and notice I said might, not will) cross y at some point in the future.
The real problem with LLM advice is that it is harder to find a "scapegoat" (particularly for legal purposes) when something goes wrong.
Microsoft claims that they have an AI setup that outperforms human doctors on diagnosis tasks: https://microsoft.ai/new/the-path-to-medical-superintelligen...
"MAI-DxO boosted the diagnostic performance of every model we tested. The best performing setup was MAI-DxO paired with OpenAI’s o3, which correctly solved 85.5% of the NEJM benchmark cases. For comparison, we also evaluated 21 practicing physicians from the US and UK, each with 5-20 years of clinical experience. On the same tasks, these experts achieved a mean accuracy of 20% across completed cases."
Of course, AI "doctors" can't do physical examinations and the best performing models cost thousands to run per case. This is also a test of diagnosis, not of treatment.
If you consider how little time doctors have to look at you (at least in Germanys half broken public health sector) and how little they actually care ...
I think x is already higher than y for me.
That's fair. Reliable access to a 70% expert is better than no access to a 99% expert.