Comment by potatolicious

7 months ago

It's also a good lesson for the new AI cycle we're in now. Often inserting ML subsystems into your broader system just makes it go from "deterministically but fixably bad" to "mysteriously and unfixably bad".

I think that’ll define the industry for the coming decades. I used to work in machine translation and it was the same. The older rules-based engines that were carefully crafted by humans worked well on the test suite and if a new case was found, a human could fix it. When machine learning came on the scene, more “impressive” models that were built quicker came out - but when a translation was bad no one knew how to fix it other than retraining and crossing one’s fingers.

  • As someone who worked in rules-based ML before the recent transformers (and unsupervised learning in general) hype, rules-based approaches were laughably bad. Only now are nondeterministic approaches to ML surpassing human level tasks, something which would not have been feasible, perhaps not even possible in a finite amount of human development time, via human-created rules.

    • The thing is that AI is completely unpredictable without human curated results. Stable diffusion made me relent and admit that AI is here now for real, but I no longer think so. It's more like artificial schizophrenia. It does have some results, often plausible seeming results, but it's not real.

  • Yes, but I think the other lesson might be that those black box machine translations have ended up being more valuable? It sucks when things don't always work, but that is also kind of life and if the AI version worked more often that is usually ok (as long as the occasional failures aren't so catastrophic as to ruin everything)

    • > Yes, but I think the other lesson might be that those black box machine translations have ended up being more valuable?

      The key difference is how tolerant the specific use case is of a probably-correct answer.

      The things recent-AI excels at now (generative, translation, etc.) are very tolerant of "usually correct." If a model can do more, and is right most of the time, then it's more valuable.

      There are many other types of use cases, though.

      1 reply →

    • Can’t help but read that and think of Tesla’s Autopilot and “Full Self Driving”. For some comparisons they claim to be safer per mile than human drivers … just don’t think too much about the error modes where the occasional stationary object isn’t detected and you plow into it at highway speed.

      7 replies →

  • yes, who exactly looked at the 70% accuracy of "live automatic closed captioning" and decided Great! ship it boys!

    • My guess: They are hoping user feedback will help them to fix the bugs later -- iterate to 99%. Plus, they are probably under unrealistic deadlines to delivery _something_.

  • But rule-based machine translation, from what I've seen, is just so bad. ChatGPT (and other LLM) is miles ahead. After seeing what ChatGPT does, I can't even call rule-based machine translation "tranlation".

    *Disclaimer: as someone who's not an AI researcher but did quite some human translation works before.

  • Perhaps using a ML to craft the deterministic rules and then have a human go over them is the sweet spot.

    • Rules could never work for translation unless the incoming text was formatted in a specific way. Eg, you just couldn't translate a conversation transcript in a pro-drop language like Japanese into English sentence-by-sentence, because the original text just wouldn't have sentences in it. So you need some "intelligence" to know who is saying what.

I think - I hope, rather - that technically minded people who are advocating for the use of ML understand the short comings and hallucinations... but we need to be frank about the fact that the business layer above us (with a few rare exceptions) absolutely does not understand the limitations of AI and views it as a magic box where they type in "Write me a story about a bunny" and get twelve paragraphs of text out. As someone working in a healthcare adjacent field I've seen the glint in executive's eyes when talking about AI and it can provide real benefits in data summarization and annotation assistance... but there are limits to what you should trust it with and if it's something big-i Important then you'll always want to have a human vetting step.

  • > I hope, rather - that technically minded people who are advocating for the use of ML understand the short comings and hallucinations.

    The people I see who are most excited about ML are business types who just see it as a black boxes that makes stock valuation go vroom.

    The people that deeply love building things, really enjoy the process of making itself, are profoundly sceptical.

    I look at generative AI as sort of like an army of free interns. If your idea of a fun way to make a thing is to dictate orders to a horde of well-meaning but untrained highly-caffienated interns, then using generative AI to make your thing is probably thrilling. You get to feel like an executive producer who can make a lot of stuff happen by simply prompting someone/something to do your bidding.

    But if you actually care about the grit and texture of actual creation, then that workflow isn't exactly appealing.

    • They wouldn’t think this way if stock investors weren’t so often such naive lemmings ready to jump off yet another cliff with each other.

    • We get it, you're skeptical of the current hype bubble. But that's one helluva no true Scotsman you've got going on there. Because a true builder, one that deeply loves building things wouldn't want to use text to create an image. Anyone who does is a business type or an executive producer. A true builder wouldn't think about what they want to do in such nasty thing as words. Creation comes from the soul, which we all know machines, and business people, don't have.

      Using English, instead of C, to get a computer to do something doesn't turn you into a beaurocrat any more than using Python or Javascript instead does.

      Only a person that truly loves building things, far deeper than you'll ever know, someone that's never programmed in a compiled language, would get that.

      13 replies →

  • I’m not optimistic on that point: the executive class is very openly salivating at the prospect of mass layoffs, and that means a lot of technical staff aren’t quick to inject some reality – if Gartner is saying it’s rainbows and unicorns, saying they’re exaggerating can be taken as volunteering to be laid off first even if you’re right.

    • Yeah but what comes after the mass layoffs? Getting hired to clean up the mess that AI eventually creates? Depending on the business it could end up becoming more expensive than if they had never adopted GenAI at all. Think about how many companies hopped on the Big Data Bandwagon when they had nothing even coming close to what "Big Data" actually meant. That wasn't as catastrophic as what AI would do but it still was throwing money in the wrong direction.

      1 reply →

  • > technically minded people who are advocating for the use of ML understand the short comings and hallucinations

    really, my impression is the opposite. They are driven by doing cool tech things and building fresh product, while getting rid of "antiquated, old" product. Very little thought given to the long term impact of their work. Criticism of the use cases are often hand waved away because you are messing with their bread and butter.

  • > but we need to be frank about the fact that the business layer above us (with a few rare exceptions) absolutely does not understand the limitations of AI and views it as a magic box where they type in

    I think we also need to be aware that this business layer above us that often sees __computers__ as a magic box where they type in. There's definitely a large spectrum of how magical this seems to that layer, but the issue remains that there are subtleties that are often important but difficult to explain without detailed technical knowledge. I think there's a lot of good ML can do (being a ML researcher myself), but I often find it ham-fisted into projects simply to say that the project has ML. I think the clearest flag to any engineer that this layer above them has limited domain knowledge is by looking at how much importance they place on KPIs/metrics. Are they targets or are they guides? Because I can assure you, all metrics are flawed -- but some metrics are less flawed than others (and benchmark hacking is unfortunately the norm in ML research[0]).

    [0] There's just too much happening so fast and too many papers to reasonably review in a timely manner. It's a competitive environment, where gatekeepers are competitors, and where everyone is absolutely crunched for time and pressured to feel like they need to move even faster. You bet reviews get lazy. The problems aren't "posting preprints on twitter" or "LLMs giving summaries", it's that the traditional peer review system (especially in conference settings) poorly scales and is significantly affected by hype. Unfortunately I think this ends up railroading us in research directions and makes it significantly challenging for graduate students to publish without being connected to big labs (aka, requiring big compute) (tuning is another common way to escape compute constraints, but that falls under "railroading"). There's still some pretty big and fundamental questions that need to be chipped away at but are difficult to publish given the environment. /rant

This is why hallucinations will never be fixed in language models. That's just how they work.