← Back to context

Comment by vidarh

3 years ago

If they're running into any limits in that respect, my bet would be that the limit would only on what is easily accessible to them without negotiating access, and that they can easily go another magnitude or two just with more incremental effort to strike deals. E.g. newspaper archives, national libraries and the like (I haven't looked at other languages, but GPT3's - since I don't know of any numbers for GPT4 - Norwegian corpus could easily be scaled at least two orders of magnitude with access to the Norwegian national library collection alone)