← Back to context

Comment by orbital-decay

4 hours ago

>the key behavior is that LLMs are really good at fuzz testing, because they are probabilistic monkeys on typewriters

That's exactly what they're not. Models post-trained with current methods/datasets have pretty poor diversity of outputs, and they're not that useful for fuzz testing unless you introduce input diversity (randomize the prompt), which is harder than it sounds because it has to be semantical. Pre-trained models have good output diversity, but they perform much worse. Poor diversity can be fixed in theory but I don't see any model devs caring much.