Comment by markbao

3 days ago

“Bad” seems extreme. The only way to pass the litmus test you’ve described is for a tool to be 100% perfect, so then the graph looks like 99.99% “bad tool” until it reaches 100% perfection.

It’s not that binary imo. It can still be extremely useful and save a ton of time if it does 90% of the work and you fix the last 10%. Hardly a bad tool.

It’s only a bad tool if you spent more time fixing the results than building it yourself, which sometimes used to be the case for LLMs but is happening less and less as they get more capable.

If you show me a tool that does a thing perfectly 99% of the time, I will stop checking it eventually. Now let me ask you: How do you feel about the people who manage the security for your bank using that tool? And eventually overlooking a security exploit?

I agree that there are domains for which 90% good is very, very useful. But 99% isn't always better. In some limited domains, it's actually worse.

  • Counterpoint.

    Humans don't get it right 100% or the time.

    • That is a true and useful component of analyzing risk, but the point is that human behaviour isn't a simple risk calculation. We tend to over-guard against things that subjectively seem dangerous, and under-guard against things that subjectively feel safe.

      This isn't about whether AI is statistically safer, it's actually about the user experience of AI: If we can provide the same guidance without lulling a human backup into complacency, we will have an excellent augmented capability.