← Back to context

Comment by protocolture

6 months ago

I like how the chosen terminology is perfectly picked to paint the concern as irrelevant.

"Since the end of atmospheric nuclear testing, background radiation has decreased to very near natural levels, making special low-background steel no longer necessary for most radiation-sensitive uses, as brand-new steel now has a low enough radioactive signature that it can generally be used."

I dont see that:

1. There will be a need for "uncontaminated" data. LLM data is probably slightly better than the natural background reddit comment. Falsehoods and all.

2. "Uncontaminated" data will be difficult to find. What with archive.org, gutenberg etc.

3. That LLM output is going to infest everything anyway.

>2. "Uncontaminated" data will be difficult to find. What with archive.org, gutenberg etc.

But recent uncontaminated data is hard to find. https://github.com/rspeer/wordfreq/blob/master/SUNSET.md

  • >Now the Web at large is full of slop generated by large language models, written by no one to communicate nothing. Including this slop in the data skews the word frequencies.

    I really do just bail out whenever anyone uses the word slop.

    >As one example, Philip Shapira reports that ChatGPT (OpenAI's popular brand of generative language model circa 2024) is obsessed with the word "delve" in a way that people never have been, and caused its overall frequency to increase by an order of magnitude.

    Should run the same analysis against the word slop.

    • >I really do just bail out whenever anyone uses the word slop.

      I do when people try to pass slop around as organic content.

      > Should run the same analysis against the word slop.

      That sounds like tautology: the more slop, the more people tend to call slop slop.

Umm… we stopped nuclear testing, which is what allowed the background radiation to reduce.

  • And cars replaced horses in london, rendering forecasts of london being buried under a mountain of horse manure irrelevant too.

    Change really is the only constant. The short term predictive game is rigged against hard predictions.