Comment by rustyhancock
20 days ago
It's a very interest benchmark. Much more impressive than needle in haystack benches or just tuneable benches.
I wonder if it's somewhat incompatible with some domains.
I.e. perhaps coding models need to rigidly stick to what they know and resist bad ideas in their contexts - I don't want my mistakes to be replicated by the model.
Still I agree with the premise that learning in session is what I want from a model.
Perhaps once models mature they will diverge even more than just having sophistication and coding or not. But creative, coding, rule based etc models
No comments yet
Contribute on Hacker News ↗