Comment by ben_w
3 days ago
> Particularly machine translations are no worse than what an untrained native speaker would come up with, and much better than traditional translators
Sometimes. I use Google Translate (literally the same architecture, last I heard), and when it works, great. Every single time I've tried demonstrating that it can't do Chinese by quoting the output it gives me from English-to-Chinese, someone replies to tell me that the translated text is gibberish*.
Even with an easier pair, English <-> German, sometimes I get duplicate paragraphs. And there's definitely still cases where even the context-comprehension fails, as you should be able to see from going to a random German website e.g. https://www.bahn.de/ in e.g. Chrome and translating it into English and noticing the out-of-place words like how destination is "goal", the tickets are "1st grade" and "2nd grade" instead of class.
* I'm curious if this is still true, so let's see:
这是一个简单的英文句子,需要翻译成中文。上次我翻译的时候,有人告诉我译文几乎无法理解。
我不懂中文,所以需要懂中文的人告诉我现在是否仍然如此。
(not the downvoter)
I'm not sure if we're on the same page. I mean LLMs right? Not whatever Google Translate and DeepL use. The latter was better than gtrans when it launched, nowadays it's probably similar idk, and both are machine learning clearly, but the products(' quality) predates LLMs. They're not LLMs. They haven't noticeably improved since LLMs. Asking an LLM produces better output (so long as the LLM doesn't get sidetracked by the text's contents). Presumably also orders of magnitude higher energy consumption per word, even if you ignore training
I agree that Google Translate, now on par with DeepL's free product afaik (but I'm not a gtrans user so I don't know), is decent but not a full replacement for humans, and that LLMs aren't as good as human translations either (not just for attention reasons), but it's another big step forwards right?
I'm not sure what DeepL uses, but Google invented the Transformer architecture, the T in GPT, for Google Translate.
IIRC, the original difference between them was about the attention mask, which is akin to how the Mandelbrot and Julia fractals are the same formula but the variables mean different things; so I'd argue they're basically still the same thing, and you can model what an LLM does as translating a prompt into a response.
I didn't know that! I had heard they made transformers and (then-Open)AI used it in GPT, but that explains how come Google wasn't then first to market with an LLM product when the intended application was translation