Comment by maxpr

6 months ago

> half-assed translation that was obviously made by a machine

That's exactly what we want to solve.

Here's the thing:

It turned out, AI translates better than humans when provided with enough correct context. Both macro context, like what the product does, and micro context, like what the component represents on screen and how it relates to other components.

As a result, algorithms extract the needed contextual hints, and a correctly configured LLM model finishes the rest.

> AI translates better than humans when provided with enough correct context

This is definitionally untrue. Humans define human language; a "correct" translation is one that an experienced translator would write.

  • I assume that they mean that an LLM is better at translating than a high-rotation, contracted (i.e. not employed, no benefits, no stability) team of MTurk-like translators who are paid cents per translated token, are given little to no context of what they're translating beyond the individual sentence, and are dealing with 10 projects at once as quickly as possible because otherwise they wouldn't be able to make a decent wage.

    But that doesn't mean that LLMs have become as good as human translators, but rather that corporations have set up a system that treats translators as if they were machines and then we act surprised when machines are better at acting machine-like than humans.

What I always wondered: why is your automatic translation better than the browser's or the user's own auto translation ?

In particular, having it user side makes it fully opt-in, and the user has full control and will accept the quality as it is, whereas your service-side auto translate is your responsibility when shit hits the fan.

  • Historically, there are a couple of reasons why developers prefer to i18n their app instead of letting users do that.

    1. PostHog has a great tool that lets developers "watch the video" of how users interact with their app's UI. Turns out, automated chrome plugins/built-in features often mess up the HTML so much that apps simply crash. I've seen devs adding translate="no" [0] in bulk to their apps because of this. Therefore, Chrome's built-in auto translation isn't the best solution (yet). 2. Product/marketing folks want users to see content in their language immediately after landing on the website 3. App developers often want to control what users see, update it, rephrase it

    If I had to guess, I'd say the approach Lingo.dev Compiler package is using today should end up being a natural part of frameworks like Remix, Next.js and Vue.

    [0] https://www.w3schools.com/tags/att_translate.asp

    • PostHog didn't cross my radar before, so it was an interesting discovery.

      I am quite surprise the apps crashes on translation, but then there is a whole user action analytics engine running in parrallel, so it sounds like a problem of having too many things running at the same time ?

      Companies that want high control on their translations have already the choice to straight translate their i18n strings, AI or not. That sound to me like a better choice and not much more onerous than the half-baked post filtering we're seeing in this article.

      I'd argue if we're going the AI route, having it extract the user text and push it into i18n resources could be a better approach ?

      1 reply →

Do you speak more than one language? Because claiming "AI translates better than human" is ludicrous. Anyone with a modicum of experience on browsing the internet can immediately tell when a page was auto-translated, based on how awkward or outright nonsensical some of the text can be.

Also, I doubt other translators work by localizing <p> elements one by one, without context. The entire HTML is localized, semantic and all. I fail to see how translating JSX instead of HTML can improve the situation much.

  • 1. I do speak more than one language. I agree with your point that perfect localization requires seeing a <p> element in the broader context of the parent component, parent page, the product, the industry, the audience and their expected level of tech savviness, the culture, and eventually preferences regarding tone of voice.

    Typically, a human would need to be educated about these aspects to translate perfectly. In the future, in my opinion, humans will be educating—or configuring—the AI to do that.

    The "localization compiler", which we've built to solve our own problem in the first place, is just a handy bunch of scripts aimed to help extract needed contextual hints that would then be passed on to the [preconfigured] LLM for translation, and it should go beyond just the names of the tags.

    FWIW, by saying AI translations I don't mean Google Translate or machine translation tech that browsers come with. I mean actual foundational AI models that OpenAI, Anthropic, Google, Meta, Mistral and others are developing.

    The difference is significant, and there's no worse thing than half-assed robotic translation produced by an MT.

    2. Regarding "AI translates better than humans." I think some commenters have already mentioned this, but the point is that outsourced translations can be worse than what LLMs can produce today, because when translations are outsourced, nobody seems to care about educating the native speaker about the product and the UI. And localizing the UI, which consists of thousands of chunks of text, is nontrivial for a human. On the flip side, a correctly configured LLM, when provided with enough relevant contextual tips, shows outstanding results.