Comment by jcattle

12 hours ago

Just a thought: This data engineering can only really occur in sciences with a significant "moat".

Expensive tools, expensive test setups, live, gene-altered animals, etc.

In fields such as deep learning or other more digital fields (my field is using a lot of freely available satellite data) replication is often cheaper and actual application of research outcomes is a lot more common.

I used to think that but....

I've reviewed for a few "replication tracks" at ML Conferences and there are a surprising number of reports where people are simply unable to replicate published results. The reasons are all over the map: sometimes the original authors' code just needs to be fixed (new libraries, different environments), but other results simply don't seem to hold up.