Comment by epsteingpt

21 hours ago

It's not really equivocation in this instance. This feels like a 'bad faith' comment. We can do better.

LLM's literally wouldn't work without the sum total of knowledge (in the forms of books and other copyrighted content) being used as 'training data' for these LLMs.

The 'bleeding edge' LLMs required many things, but: 1 Tech innovation ('attention') 2 Lots of compute 3 Data 4 Pre + post training

#4 doesn't happen without #3.

It's pretty obvious at this point that the major providers have stolen vast amounts of #3 - they have paid nearly 0 of the creators.

We can argue about the impact (I'd lean net good) vs. the cost. But arguing there isn't a cost is a bit silly.

3 comments

epsteingpt

nonethewiser 21 hours ago

All of this supports the fact that models arent essentially just web crawling

margalabargala 20 hours ago

Sure, but alibaba is still building an LLM. The scraping of responses and the scraping of websites occupy the same location in the stack of each. It's very comparable.

bel8 18 hours ago

The tech is Google's invention, popularized by OpenAI, so Anthropic should still stfu in that case.