Comment by naasking

2 years ago

I understand that's the context, but I'm not sure that it's unfair nitpicking. It's common to talk about training data and how poor LLMs are compared to humans despite the apparently larger dataset than any human could absorb in a lifetime. The argument is just wrong because it doesn't properly quantify the dataset size, and when you do, you actually conclude the opposite: it's astounding how good LLMs are despite their profound disadvantage.

1 comment

naasking

cs702 2 years ago

> I understand that's the context, but I'm not sure that it's unfair nitpicking.

The OP is about much more than that, and taken as a whole, suggests the author is well aware that human beings absorb a lot more data from multiple domains. It struck me as unfair to criticize one sentence out of context while ignoring the rest of the OP.

> It's common to talk about training data and how poor LLMs are compared to humans despite the apparently larger dataset than any human could absorb in a lifetime.

Thank you. Like I said, I agree. My sense is the author would agree too.

It's possible that to overcome some of the limits we're starting to see, AI models may need to absorb a giant, endless, torrential stream of non-textual, multi-domain data, like people.

At the moment, we don't know.