← Back to context

Comment by MarginalGainz

1 month ago

We are seeing this exact 'hostile information environment' play out aggressively in e-commerce search.

The 'Dead Internet' (specifically AI-generated SEO slop) has effectively broken traditional keyword search (BM25/TF-IDF). Bad actors can now generate thousands of product descriptions that mathematically match a user's query perfectly but are semantically garbage/fake.

We had to pivot our entire discovery stack to Semantic Search (Vector Embeddings) sooner than planned. Not just for better recommendations, but as an adversarial filter.

When you match based on intent vectors rather than token overlap, the 'synthetic noise' gets filtered out naturally because the machine understands the context, not just the string match. Semantic search is becoming the only firewall against the dead internet.

I'm not sure I agree - on the one hand yes, it's trivial to generate pages stuffed with keywords. But on the other hand Google is already interpreting search intent, and while this is okay for some things it is extraordinarily frustrating when trying to look for something specific.

Often I do want exact matches, and Google refuses to show them no matter what special characters you use to try to modify the search behaviour.

Personally I'd rather search engines continue to return exact matches and just de-rank content that has poor reputation, and if I want to have a more free-form experience I'll use LLMs instead.