← Back to context

Comment by williamcotton

2 years ago

> I think there’s a simpler explanation. Imagine what it would look like if ChatGPT were a lossless algorithm. If that were the case, it would always answer questions by providing a verbatim quote from a relevant Web page. We would probably regard the software as only a slight improvement over a conventional search engine, and be less impressed by it.

Tautologically, yes, ChatGPT works because it is, as defined by the author, a lossy algorithm. If it were a lossless algorithm it wouldn't work the way it does now.

> The fact that ChatGPT rephrases material from the Web instead of quoting it word for word makes it seem like a student expressing ideas in her own words, rather than simply regurgitating what she’s read; it creates the illusion that ChatGPT understands the material. In human students, rote memorization isn’t an indicator of genuine learning, so ChatGPT’s inability to produce exact quotes from Web pages is precisely what makes us think that it has learned something. When we’re dealing with sequences of words, lossy compression looks smarter than lossless compression.

This is where the analogy of a lossy and lossless compression algorithm breaks down. Yes, a loosely similar approach of principle component analysis and dimensional reduction as used in lossy compression algorithms is being applied and we can see that most directly in a technical sense with GPT `embedding vector(1536)`, but there is a big difference: ChatGPT is also a translator and not just a synthesizer.

This has nothing to do with "looking smarter". It has to do with being reliably proficient at both translating and synthesizing.

When given an analytic prompt like "turn this provided box score into an entertaining outline", ChatGPT proves itself to be a reliable translator, because it can reference all of the facts in the prompt itself.

When given a synthetic prompt like "give me some quotes from the broadcast", ChatGPT proves itself to be a reliable synthesizer, because it can provide fictional quotes that sound correct when the facts are not present in the prompt itself.

The synthetic prompts function in a similar manner to lossy compression algorithms. The analytic prompts do not. This lossy compression algorithm theory, also known as the bullshit generator theory, is an incomplete description of large language models.

https://williamcotton.com/articles/chatgpt-and-the-analytic-...

> This has nothing to do with "looking smarter". It has to do with being reliably proficient at both translating and synthesizing.

I think the author's point is about how people perceive lossy text output differently than they perceive lossy image output. Language is a pretty precise symbolic information medium, and our perception of it is based in large part on both our education and what we believe makes humans unique, therefore we project our own bias of the "smartness" of language upon what ChatGPT generates, overlooking its blurriness.

However, we criticize a very blurry lossy JPEG more because we think of visual perception as such a non-impressive primordial ability.

  • I don't think "lossy text" is a useful term because it conflates with th*s k*nd *f l*ss* t*xt as well. Lossy compression is designed to be as reversible as it can be to a given threshold. That's not how ChatGPT was either designed or works in practice. There are definitely a lot of mathematical similarities between the two, I won't deny that.

    Would "partial knowledge compression" be a better term? Partial knowledge of both English and French is a requirement to reliably translate from English to French. Partial knowledge of both baseball box scores and entertaining paragraph outlines in English is a requirement to reliably translate from a box score into an entertaining outline, right?

    • To me, "lossy compression" vs "Partial knowledge compression" sounds like six vs a half-dozen. Whatever you call it, I think the author was writing more about how we perceive the results generated from a language-compression model vs an image compression model.

      2 replies →