Comment by thrance

6 months ago

Please, please, please, do not use auto translators to localize your pages. There's nothing worse than an half-assed translation that was obviously made by a machine.

Auto-translated sentences are awkward and I feel extremely insulted every time someone chooses to impose this garbage watered-down version of their products on me.

Hire a translator or don't localize your site.

> half-assed translation that was obviously made by a machine

That's exactly what we want to solve.

Here's the thing:

It turned out, AI translates better than humans when provided with enough correct context. Both macro context, like what the product does, and micro context, like what the component represents on screen and how it relates to other components.

As a result, algorithms extract the needed contextual hints, and a correctly configured LLM model finishes the rest.

  • > AI translates better than humans when provided with enough correct context

    This is definitionally untrue. Humans define human language; a "correct" translation is one that an experienced translator would write.

    • I assume that they mean that an LLM is better at translating than a high-rotation, contracted (i.e. not employed, no benefits, no stability) team of MTurk-like translators who are paid cents per translated token, are given little to no context of what they're translating beyond the individual sentence, and are dealing with 10 projects at once as quickly as possible because otherwise they wouldn't be able to make a decent wage.

      But that doesn't mean that LLMs have become as good as human translators, but rather that corporations have set up a system that treats translators as if they were machines and then we act surprised when machines are better at acting machine-like than humans.

  • What I always wondered: why is your automatic translation better than the browser's or the user's own auto translation ?

    In particular, having it user side makes it fully opt-in, and the user has full control and will accept the quality as it is, whereas your service-side auto translate is your responsibility when shit hits the fan.

    • Historically, there are a couple of reasons why developers prefer to i18n their app instead of letting users do that.

      1. PostHog has a great tool that lets developers "watch the video" of how users interact with their app's UI. Turns out, automated chrome plugins/built-in features often mess up the HTML so much that apps simply crash. I've seen devs adding translate="no" [0] in bulk to their apps because of this. Therefore, Chrome's built-in auto translation isn't the best solution (yet). 2. Product/marketing folks want users to see content in their language immediately after landing on the website 3. App developers often want to control what users see, update it, rephrase it

      If I had to guess, I'd say the approach Lingo.dev Compiler package is using today should end up being a natural part of frameworks like Remix, Next.js and Vue.

      [0] https://www.w3schools.com/tags/att_translate.asp

      2 replies →

  • Do you speak more than one language? Because claiming "AI translates better than human" is ludicrous. Anyone with a modicum of experience on browsing the internet can immediately tell when a page was auto-translated, based on how awkward or outright nonsensical some of the text can be.

    Also, I doubt other translators work by localizing <p> elements one by one, without context. The entire HTML is localized, semantic and all. I fail to see how translating JSX instead of HTML can improve the situation much.

    • 1. I do speak more than one language. I agree with your point that perfect localization requires seeing a <p> element in the broader context of the parent component, parent page, the product, the industry, the audience and their expected level of tech savviness, the culture, and eventually preferences regarding tone of voice.

      Typically, a human would need to be educated about these aspects to translate perfectly. In the future, in my opinion, humans will be educating—or configuring—the AI to do that.

      The "localization compiler", which we've built to solve our own problem in the first place, is just a handy bunch of scripts aimed to help extract needed contextual hints that would then be passed on to the [preconfigured] LLM for translation, and it should go beyond just the names of the tags.

      FWIW, by saying AI translations I don't mean Google Translate or machine translation tech that browsers come with. I mean actual foundational AI models that OpenAI, Anthropic, Google, Meta, Mistral and others are developing.

      The difference is significant, and there's no worse thing than half-assed robotic translation produced by an MT.

      2. Regarding "AI translates better than humans." I think some commenters have already mentioned this, but the point is that outsourced translations can be worse than what LLMs can produce today, because when translations are outsourced, nobody seems to care about educating the native speaker about the product and the UI. And localizing the UI, which consists of thousands of chunks of text, is nontrivial for a human. On the flip side, a correctly configured LLM, when provided with enough relevant contextual tips, shows outstanding results.

And just say "sorry" to all the people asking you for translation of your great product ?

  • Whenever I see automatic translation into my language, I leave the page as most of the time it's unreadable. Microsoft docs is the worst offender.

    Yeah, I'd prefer no translation over bad translation.

  • They're asking for a reliable translation, otherwise they'd just let their browser auto-translate the page.

  • Hire a translator then, don't give us a garbage localization and call it a day.

    It's like if someone requested a feature and you gave them the first thing an LLM spewed out when asked to code it, without review.

    You should at least have someone on your team be able to understand the program's output and correct it when things inevitably sound off.

Would you literally rather have nothing than a poor translation?

  • If I want a machine translation of something, I can throw the text into DeepL myself. Getting text that was machine translated Japanese<->English with no access to the original is pretty much never what I want, and yet sites insist on doing it based on my IP address or system language.

    Also, if a website offers a language I take that as an indication that the organization is prepared to deal with speakers of that language/people from the country in question (customer support, shipping, regional/legal concerns). Whether the site offers a certain language is a useful signal to figure this out quickly, and if poking around reveals machine translation into dozens of languages, it's a signal that they're probably not prepared to provide reliable services/support.

  • In some cases, yes. A non-native but passable speaker/reader of English might prefer to struggle through the English UI themselves than deal with your bad AI-generated translation. If they do it themselves, at least they can skip the parts they know, see multiple possible translations, and take advantage of their partial knowledge of the UI language. If you dump everything into an LLM with no knowledge of your target languages at all, you’re setting yourself up for disaster when a critical string is mistranslated.

  • Yes, very much so.

    If you're bilingual you must know this feeling of reading an awful translation; of knowing someone wanted to offer their product to people speaking your language but couldn't be bothered to do it well, and so used google translate and called it a day, thinking those dumb users won't notice the slop they're feeding them. Fuck that.