← Back to context

Comment by xnx

21 hours ago

> I'm particularly annoyed by using LLMs to evaluate the output of LLMs

This does seem a little crazy on its face, but it is yielding useful and improving tools.

It's not about it being crazy and it's not about personal opinions about AI. It's about chaos mathematics. Iterating with the same system like that has certain easy-to-understand failure states. It's why I phrased it specifically in terms of using the same architecture to validate itself. If we had two radically different AI architectures that were capable of evaluating each other, firing them at each other for evaluation purposes would be much, much less susceptible to this sort of problem than firing either of them at themselves. That will never be a good idea.

See also a cousin comment of mine observing that human brains are absolutely susceptible to the same effect. We're just so used to it that it is the water we swim through. (And arguably human brains are more diverse than current AI systems functioning at this level. No bet on how long that will be true for, though.)

Such composite systems would still have their own characteristics and certainly wouldn't be guaranteed to be perfect or anything, but at least they would not tend to iteratively magnify their own individual flaws.

Perhaps someday we will have such diverse architectures. We don't today have anything that can evaluate LLMs other than human brains, though.