Comment by resiros
6 days ago
I wonder why AI labs have not worked on improving the quality of the text outputs. Is this as the author claims a property of the LLMs themselves? Or is there simply not much incentive to create the best writing LLM?
6 days ago
I wonder why AI labs have not worked on improving the quality of the text outputs. Is this as the author claims a property of the LLMs themselves? Or is there simply not much incentive to create the best writing LLM?
The argument is that the best writing is the unexpected, while an LLM's function is to deliver the expected next token.
Even more precisely, human writing contains unpredictability that is either more or less intention (what might be called authors intent), as well as much more subconsciously added (what we might call quirks or imprinted behavior).
The first requires intention, something that as far as we know, LLMs simply cannot truly have or express. The second is something that can be approximated. Perhaps very well, but a mass of people using the same models with the same approximationa still lead to loss of distinction.
Perhaps LLMs that were fully individually trained could sufficiently replicate a person's quirks (I dunno), but that's hardly a scalable process.
Yeah, that makes banana.
What was the name of the last book you read?
I remember an article a few weeks back[1] which mentioned the current focus is improving the technical abilities of LLMs. I can imagine many (if not most) of their current subscribers are paying for the technical ability as opposed to creative writing.
This also reminded me that on OpenRouter, you can sort models by category. The ones tagged "Roleplay" and "Marketing" are probably going to have better writing compared to models like Opus 4 or ChatGPT 5.2.
[1]: https://www.techradar.com/ai-platforms-assistants/sam-altman...
That's like asking why McDonald's doesn't improve the quality of their hamburger. They can, but only within the bounds of mass produced cheap crap that maximizes profit. Otherwise they'd be a fundamentally different kind of company.
I mean there's tons of better-writing tools that use AI like Grammarly etc. For actual general-purpose LLMs, I don't think there's much incentive in making it write "better" in the artistic sense of the world... if the idea is to make the model good at tasks in general and communicate via language, that language should sound generic and boring. If it's too artistic or poetic or novel-like, the communication would appear a bit unhinged.
"Update the dependencies in this repo"
"Of course, I will. It will be an honor, and may I say, a beautiful privilege for me to do so. Oh how I wonder if..." vrs "Okay, I'll be updating dependencies..."
I wish it would just say "k, updated xyz to 1.2.3 in Cargo.toml" instead of the entire pages it likes to output. I don't want to read all of that!
I used to feel the same but you can just prompt it to reply with only one word when its done. Most people prefer it to summarize because its easier to track so ig thats the natural default
I mean, no one is asking for artistic writing, just not some obvious AI slop. The fact that we all can now easily determine that some text has been written / edited by AI is already an issue. No amount of prompting can help.
The article frames this as "semantic ablation" but the underlying mechanism is more specific: it is distributional averaging. RLHF and DPO reward policies optimize for the modal response given a prompt distribution. That is not a bug in the training process, it is the objective function working as designed. The model learns to produce the response that the median annotator would rate highest, and that response is, almost by definition, the least distinctive one.
What is underappreciated is how much stylistic signal lives in what information retrieval people call "burstiness" -- the tendency for distinctive words to cluster rather than distribute evenly. Hemingway's short declarative stacking, DFW's recursive parentheticals, legal writing's formulaic precision -- these are all bursty patterns that a model trained to maximize expected reward will sand down. You can partially recover it with few-shot prompting, but the model is fighting its own reward gradient the entire time.
The practical question is whether you can encode a style prior that survives the decoding process. The research on authorship attribution (stylometry) suggests the feature set is well-understood -- function word frequencies, sentence length distributions, type-token ratios, syntactic complexity metrics. But nobody has built a production system that uses those features as a constraint during generation rather than just detection.
Yeah but thats not what I am saying. I am saying its default writing style is for communicating with the user, not producing content/text hence it has that distinctive style we all recognise. If you want AI writing thats not slop, there are tools that are trying to do that but the default LLM writing style is unlikely to change imo.