Comment by Hfuffzehn
1 day ago
I agree. But notice that you assume that there is a metric with which you can messure improvement. Which is fine if you are measuring against your personal taste.
But it might be that the optimization target itself has a ceiling. If you're training toward human approval ratings from a broad population, you converge toward what median preference selects for. The plateau is baked into what you're measuring against.
No comments yet
Contribute on Hacker News ↗