Comment by j_maffe

19 days ago

The timed-reveal aspect is also interesting.

6 comments

j_maffe

data_maan 18 days ago

How is that interesting for a scientific point of view? This seems more like a social experiment dressed as science.

Science should be about reproducibility, and almost nothing here is reproducible.

bwfan123 18 days ago

> Science should be about reproducibility, and almost nothing here is reproducible.
I can see your frustration. You are looking for reproducible "benchmarks". But you have to realize several things.
1) research level problems are those that bring the "unknown" into the "known" and as such are not reproducible. That is why "creativity" has no formula. There are no prescribed processes or rules for "reproducing" creative work. If there were, then they would not be considered "research".
2) things learnt and trained are already in the realm of the "known", ie, boiler-plate, templated and reproducible.
The problems in 2) above are where LLMs excel, but they have been hyped into excelling at 1) as well. And this experiment is trying to test that hypothesis.
cowsandmilk 18 days ago
Deepmind’s Nobel Prize was primarily for its performance in CASP which is pretty much exactly this. Labs solve structures of proteins, but don’t publish them until after all the computational teams predict structures.
So I’m not sure where you’re coming from claiming that this isn’t scientific.
- data_maan 18 days ago
  
  It wasn't like this in any way.
  CASP relies on a robust benchmark (not just 10 random proteins), and has clear participation criteria, objective metrics how the eval plays out, etc.
  So I stand by my claim: This isn't scientific. If CASP is Japan, a highly organized & civilized society, this is a banana republic.
thesmtsolver2 18 days ago
Reproducibility is just one aspect of science, logic + reasoning from principles and data is the major aspect.
There are some experiments which cannot be carried out more than once.
- data_maan 18 days ago
  
  > There are some experiments which cannot be carried out more than once
  Yes, in which case a very detailed methodology is required: which hardware, runtimes, token counts etc.
  This does none of that.