Comment by pegasus
2 years ago
That part made the least sense for me. Since a more advanced version of a LLM would be better at extracting the truth of things from the given data, what could it possibly gain from ingesting the output of a less precise version of itself? It couldn't ever add anything useful, almost by definition.
What if the new version could learn by verifying various outputs of the old version for internal consistency (or lack thereof)?