← Back to context

Comment by qnleigh

3 hours ago

I did mention RL as a valid counterargument in my comment.

I agree that in verifiable domains RL systems should be able to blow past human performance, and this might already be happening. There's another interesting question as to how much RL improves performance on non-verifiable domains. I'm not taking a stance either way, I just think it's an interesting question.