Comment by antichronology
19 hours ago
I watched an interview with one of the co-founders of Anthropic where his point is that although benchmarks saturate they're still an important signal for model development.
We think the situation is similar here - one the challenges is aligning the benchmark with the function of the models. Genomic benchmarks for gLMs and RNA foundation models have been very resistant to staturation.
I think in NLP the problem is that they are victims of their own success where the models can be overfit to particular benchmarks really fast.
In genomics we're a bit behind. A good paper on this is DartEval where they provide levels of complexity https://arxiv.org/abs/2412.05430
in RNA the models work much better than DNA prediction but it's key to have benchmarks to measure progress.
No comments yet
Contribute on Hacker News ↗