← Back to context

Comment by rustyhancock

20 days ago

It's a very interest benchmark. Much more impressive than needle in haystack benches or just tuneable benches.

I wonder if it's somewhat incompatible with some domains.

I.e. perhaps coding models need to rigidly stick to what they know and resist bad ideas in their contexts - I don't want my mistakes to be replicated by the model.

Still I agree with the premise that learning in session is what I want from a model.

Perhaps once models mature they will diverge even more than just having sophistication and coding or not. But creative, coding, rule based etc models