← Back to context

Comment by epwr

2 days ago

Could you elaborate or link something here? I think about this pretty frequently, so would love to read something!

Metric: time to run 100m

Context: track athlete

Does it cease to be a good metric? No. After this you can likely come up with many examples of target metrics which never turn bad.

  • If it were a good metric there wouldn't be a few phone books worth of regulations on what you can do before and during running 100 meters. From banning rocket shoes, to steroids, to robot legs the 100 meter run is a perfect example of a terrible metric both intrinsically as a measure of running speed and extrinsically as a measure of fitness.

  • > Metric: time to run 100m

    > Context: track athlete

    > Does it cease to be a good metric? No.

    What do you mean? People start doping or showing up with creatively designed shoes and you need to layer on a complicated system to decide if that's cheating, but some of the methods are harder to detect and then some people cheat anyway, or you ban steroids or stimulants but allow them if they're by prescription to treat an unrelated medical condition and then people start getting prescriptions under false pretexts in order to get better times. Or worse, someone notices that the competition can't set a good time with a broken leg.

  • So what is your argument, that it doesn't apply everywhere therefore it applies nowhere?

    You're misunderstanding the root cause. Your example works as the the metric is well aligned. I'm sure you can also think of many examples where the metric is not well aligned and maximizing it becomes harmful. How do you think we ended up with clickbait titles? Why was everyone so focused on clicks? Let's think about engagement metrics. Is that what we really want to measure? Do we have no preference over users being happy vs users being angry or sad? Or are those things much harder to measure, if not impossible to, and thus we focus on our proxies instead? So what happens when someone doesn't realize it is a proxy and becomes hyper fixated on it? What happens if someone does realize it is a proxy but is rewarded via the metric so they don't really care?

    Your example works in the simple case, but a lot of things look trivial when you only approach them from a first order approximation. You left out all the hard stuff. It's kinda like...

    Edit: Looks like some people are bringing up metric limits that I couldn't come up with. Thanks!

    • > So what is your argument, that it doesn't apply everywhere therefore it applies nowhere?

      I never said that. Someone said the law collapses, someone asked for a link, I gave an example to prove it does break down in some cases at least, but many cases once you think more about it. I never said all cases.

      If it works sometimes and not others, it's not a law. It's just an observation of something that can happen or not.

      8 replies →

  • > Does it cease to be a good metric?

    Yes if you run anything other than the 100m

  • Do you have an example that doesn't involve an objective metric? Of course objective metrics won't turn bad. They're more measurements than metrics, really.

    •   > an objective metric
      

      I'd like to push back on this a little, because I think it's important to understanding why Goodhart's Law shows up so frequently.

      *There are no /objective/ metrics*, only proxies.

      You can't measure a meter directly, you have to use a proxy like a tape measure. Similarly you can't measure time directly, you have to use a stop watch. In a normal conversation I wouldn't be nitpicking like this because those proxies are so well aligned with our intended measures and the lack of precision is generally inconsequential. But once you start measuring anything with precision you cannot ignore the fact that you're limited to proxies.

      The difference of when we get more abstract in our goals is not too dissimilar. Our measuring tools are just really imprecise. So we have to take great care to understand the meaning of our metrics and their limits, just like we would if we were doing high precision measurements with something more "mundane" like distance.

      I think this is something most people don't have to contend with because frankly, very few people do high precision work. And unfortunately we often use algorithms as black boxes. But the more complex a subject is the more important an expert is. It looks like they are just throwing data into a black box and reading the answer, but that's just a naive interpretation.

      6 replies →