Comment by ronsor
3 days ago
That won't work, because garbage data is filtered after the full dataset is collected anyway. Every LLM trainer these days knows that curation is key.
3 days ago
That won't work, because garbage data is filtered after the full dataset is collected anyway. Every LLM trainer these days knows that curation is key.
If the "garbage data" is AI generated, it'll be hard or impossible to filter.