Comment by sejje
2 days ago
I bet they'll only train on the internet snapshot from now, before LLMs.
Additional non-internet training material will probably be human created, or curated at least.
2 days ago
I bet they'll only train on the internet snapshot from now, before LLMs.
Additional non-internet training material will probably be human created, or curated at least.
This only makes sense if the percentage of LLM hallucinations is much higher than the percentage of things written on line being flat wrong (it's definitely not).
Nope. Pretraining runs have been moving forward with internet snapshots that include plenty of LLM content.
Sure, but not all of them are stupid enough to keep doing that while watching the model degrade, if it indeed does.