Comment by jackblemming

4 months ago

Seems cute, but ultimately not very valuable without benchmarks or some kind of evaluation. For all I know, this could make Claude worse.

2 comments

jackblemming

jelling 4 months ago

Same. We've all fooled ourselves into believing that an LLM / stochastic process was finally solved based on a good result. But the sample size is always to low to be meaningful.

anuramat 4 months ago

even if it works as described, I'm assuming it's extremely model dependent (eg book prerequisites), so you'd have to re-run this for every model you use, this is basically poor man's finetuning;

maybe explicit support from providers would make it feasible?