Comment by ben_w
1 day ago
I'm not sure what DeepL uses, but Google invented the Transformer architecture, the T in GPT, for Google Translate.
IIRC, the original difference between them was about the attention mask, which is akin to how the Mandelbrot and Julia fractals are the same formula but the variables mean different things; so I'd argue they're basically still the same thing, and you can model what an LLM does as translating a prompt into a response.
I didn't know that! I had heard they made transformers and (then-Open)AI used it in GPT, but that explains how come Google wasn't then first to market with an LLM product when the intended application was translation