← Back to context

Comment by epsteingpt

21 hours ago

It's not really equivocation in this instance. This feels like a 'bad faith' comment. We can do better.

LLM's literally wouldn't work without the sum total of knowledge (in the forms of books and other copyrighted content) being used as 'training data' for these LLMs.

The 'bleeding edge' LLMs required many things, but: 1 Tech innovation ('attention') 2 Lots of compute 3 Data 4 Pre + post training

#4 doesn't happen without #3.

It's pretty obvious at this point that the major providers have stolen vast amounts of #3 - they have paid nearly 0 of the creators.

We can argue about the impact (I'd lean net good) vs. the cost. But arguing there isn't a cost is a bit silly.

All of this supports the fact that models arent essentially just web crawling

  • Sure, but alibaba is still building an LLM. The scraping of responses and the scraping of websites occupy the same location in the stack of each. It's very comparable.

The tech is Google's invention, popularized by OpenAI, so Anthropic should still stfu in that case.