Comment by SllX

3 months ago

Given the overwhelming amounts of slop that have been plaguing search results, it’s about damn time. It’s bad enough that I don’t even down rank all of them, just the worst ones that are most prevalent in the search results and skip over the rest.

3 comments

SllX

VHRanger 3 months ago

Yes, a fun fact about slop text is that it's very low perplexity text (basically: it's statistically likely text from an LLM's point of view) so most algorithms that rank will tend to have a bias towards preferring this text.

Since even classical machine learning uses BERT based embeddings on the backend this problem is likely wider scale than it seems if a search engine isn't proactively filtering it out

JumpCrisscross 3 months ago
> low perplexity text
Is this a term of art? (How is perplexity different from complexity, colloquially, or entropy, particularly?)
- VHRanger 3 months ago
  
  Perplexity is a term of art in LLM training, yes.
  A naive way of scoring how AI laden text is would be to run n-1 layers of a model and compare the text to the probability space of tokens from the model.
  It works somewhat to detect obvious text but is not strong enough a method by itself.