Comment by Prof_Sigmund

1 day ago

The authors talk about "a model's ability to align with human decisions" as a matter of the past. The omission in the paper is RLHF (Reinforcement Learning from Human Feedback). All these companies are "teaching machines to predict the preferences of people who click 'Accept All Cookies' without reading," by using low-paid human evaluators — “AI teachers.”

If we go back to Google, before its transformation into an AI powerhouse — as it gutted its own SERPs, shoving traditional blue links below AI-generated overlords that synthesize answers from the web’s underbelly, often leaving publishers starving for clicks in a zero-click apocalypse — what was happening?

The same kind of human “evaluators” were ranking pages. Pushing garbage forward. The same thing is happening with AI. As much as the human "evaluators" trained search engines to elevate clickbait, the very same humans now train large language models to mimic the judgment of those very same evaluators. A feedback loop of mediocrity — supervised by the... well, not the best among us. The machines still, as Stephen Wolfram wrote, for any given sequence, use the same probability method (e.g., “The cat sat on the...”), in which the model doesn’t just pick one word. It calculates a probability score for every single word in its vast vocabulary (e.g., “mat” = 40% chance, “floor” = 15%, “car” = 0.01%), and voilà! — you have a “creative” text: one of a gazillion mindlessly produced, soulless, garbage “vile bile” sludge emissions that pollute our collective brains and render us a bunch of idiots, ready to swallow any corporate poison sent our way.

In my opinion, even worse: the corporates are pushing toward “safety” (likely from lawsuits), and the AI systems are trained to sell, soothe, and please — not to think, or enhance our collective experience.