← Back to context

Comment by andy99

20 hours ago

> mRNABench

Just curious, in other areas of ML, I think it's widely acknowledged that benchmarks have pretty limited real world value, just end up getting saturated, and (my view) are all pretty correlated, regardless of their ostensible speciality and don't really tell you that much.

Do you think mRNABench is different, or where do you see the limitations? Do you imagine this or any benchmark will be useful for anything beyond comparing how different models do on the benchmark?

I watched an interview with one of the co-founders of Anthropic where his point is that although benchmarks saturate they're still an important signal for model development.

We think the situation is similar here - one the challenges is aligning the benchmark with the function of the models. Genomic benchmarks for gLMs and RNA foundation models have been very resistant to staturation.

I think in NLP the problem is that they are victims of their own success where the models can be overfit to particular benchmarks really fast.

In genomics we're a bit behind. A good paper on this is DartEval where they provide levels of complexity https://arxiv.org/abs/2412.05430

in RNA the models work much better than DNA prediction but it's key to have benchmarks to measure progress.

Here is the link for benchmarks and their utility: https://youtu.be/JdT78t1Offo?t=1444

"We have internal benchmarks. Yeah. But we don't we don't publish them."

"we have internal benchmarks that the team focuses on and improving and then we also have a bunch of tasks like I think that accelerating our own engineers is like a top top priority for us"

The equivalent for us would be to ultimate looking to improve experimental results. Benchmarks are a good intermediate point but not the ultimate goal