Comment by kouteiheika

1 day ago

> While LLMs are pretty good, and likely to improve, my experience is OpenAI's offerings absolutely make stuff up after a few thousand words or so, and they're one of the better ones.

That's not how you get good translations from off-the-shelf LLMs! If you give a model the whole book and expect it to translate it in one-shot then it will eventually hallucinate and give you bad results.

What you want is to give it a small chunk of text to translate, plus previously translated context so that it can keep the continuity.

And for the best quality translations what you want is to use a dedicated model that's specifically trained for your language pairs.

> And as for politics, as Grok has just been demonstrating, they're quite capable of whatever bias they've been trained to have or told to express.

In an open ended questions - sure. But that doesn't apply to translations where you're not asking the model to come up with something entirely by itself, but only getting it to accurately translate what you wrote into another language.

I can give you an example. Let's say we want to translate the following sentence:

"いつも言われるから、露出度抑えたんだ。"

Let's ask a general purpose LLMs to translate it without any context (you could get a better translation if you'd give it context and more instructions):

ChatGPT (1): "Since people always comment on it, I toned down how revealing it is."

ChatGPT (2): "People always say something, so I made it less revealing."

Qwen3-235B-A22B: "I always get told, so I toned down how revealing my outfit is."

gemma-3-27b-it (1): "Because I always get told, I toned down how much skin I show."

gemma-3-27b-it (2): "Since I'm always getting comments about it, I decided to dress more conservatively."

gemma-3-27b-it (3): "I've been told so often, I decided to be more modest."

Grok: "I was always told, so I toned down the exposure."

And how humans would translate it:

Competent human translator (I can confirm this is an accurate translation, but perhaps a little too literal): "Everyone was always saying something to me, so I tried toning down the exposure."

Activist human translator: "Oh those pesky patriarchal societal demands were getting on my nerves, so I changed clothes."

(Source: https://www.youtube.com/watch?v=dqaAgAyBFQY)

It should be fairly obvious which one is the biased one, and I don't think it's the Grok one (which is a little funny, because it's actually the most literal translation of them all).

>> While LLMs are pretty good, and likely to improve, my experience is OpenAI's offerings absolutely make stuff up after a few thousand words or so, and they're one of the better ones.

> That's not how you get good translations from off-the-shelf LLMs! If you give a model the whole book and expect it to translate it in one-shot then it will eventually hallucinate and give you bad results.

You're assuming something about how I used ChatGPT, but I don't know what exactly you're assuming.

> What you want is to give it a small chunk of text to translate, plus previously translated context so that it can keep the continuity

I tried translating a Wikipedia page to support a new language, and ChatGPT rather than Google translate because I wanted to retain the wiki formatting as part of the task.

LLM goes OK for a bit, then makes stuff up. I feed in a new bit starting from its first mistake, until I reach a list at which point the LLM invented random entries in that list. I tried just that list in a bunch of different ways, including completely new chat sessions and the existing session, it couldn't help but invent things.

> In an open ended questions - sure. But that doesn't apply to translations where you're not asking the model to come up with something entirely by itself, but only getting it to accurately translate what you wrote into another language.

"Only" rather understates how hard translation is.

Also, "explain this in Fortnite terms" is a kind of translation: https://x.com/MattBinder/status/1922713839566561313/photo/3