Comment by bwfan123

19 days ago

> Science should be about reproducibility, and almost nothing here is reproducible.

I can see your frustration. You are looking for reproducible "benchmarks". But you have to realize several things.

1) research level problems are those that bring the "unknown" into the "known" and as such are not reproducible. That is why "creativity" has no formula. There are no prescribed processes or rules for "reproducing" creative work. If there were, then they would not be considered "research".

2) things learnt and trained are already in the realm of the "known", ie, boiler-plate, templated and reproducible.

The problems in 2) above are where LLMs excel, but they have been hyped into excelling at 1) as well. And this experiment is trying to test that hypothesis.