Comment by majormajor

2 years ago

I don't think you can describe the math in this context as "generalize well to new data."

ChatGPT certainly can't generate new data. It's not gonna correctly tell you today who won the World Series in 2030. It's not going to write a poem in the style of someone who hasn't been born yet.

But it can interpolate between and through a bunch of existing data that's on the web to produce novel mixes of it. I find the "blurring those things together" analogy pretty compelling there, in the same way that blurring or JPEG-compressing something isn't going to give you a picture of a new event but it might change what you appear to see in the data you already had.

(Obviously it's not exactly the same, that's why it's an analogy and not a definition. As an analogy, it works much better if you ignore much of what you know about the implementation details of both of them. It's not trying to teach someone how to build it, but to teach a lay person how to think about the output.)

It absolutely can generate new data, it does so all the time. If you are claiming otherwise I think we need a more formal definition of what you mean by new data.

Are you suggesting because it can't predict the future it can't generate novel data?

  • It's not just the future, though the examples I gave were future oriented.

    But it's all very interpolation/summarization-focused.

    A "song lyrics in the style of Taylor Swift" isn't an actual song by Taylor Swift.

    A summary of the history of Texas isn't actually vetted by any historian to ensure accuracy.

    The answer to a math problem may not be correct.

    To me, those things don't qualify as "new data." They aren't suitable for future training as-is. Sometimes for a simple reason: they aren't facts, using the dictionary "facts and statistics collected together for reference or analysis" definition of data. So very simply "not new data."

    Sometimes in a blurrier way - the song lyrics, for instance, could be touching, or poignant, or "true" in a Keats sense[0] - but if the internet gets full of GPT-dreams and future models are trained on that, you could slide down further and further into an uncanny valley, especially since most of the time you don't get one of those amazing poignant ones. Most of the time I've gotten something bland.

    [0] "What the imagination seizes as beauty must be truth"

    • One way to think about prompting is as a conditional probability distribution. There is a particular song by Taylor Swift or the set of all songs by Taylor Swift but ChatGPT is particularly talented at sampling the "set of all songs in the style of Taylor Swift".

      One of the worst problems in the "Expert Systems" age of A.I. was reasoning over uncertainty, for instance this system

      https://en.wikipedia.org/wiki/Mycin

      had a half-baked approach that worked well enough for a particular range of medical diagnosis. In general it is an awful problem because it involves sampling over a joint probability distribution. If you have 1000 variables you have to sample a 1000-dimensional space, to do it the brute force way you'd have sample the data in an outrageous number of hypercubes.

      Insofar as machine learning is successful it is that we have algorithms that take a comparatively sparse sample and make a good guess of what the joint p.d. is. The success of deep learning is particularly miraculous in that respect.