Comment by PeterFBell

2 months ago

It's complicated. Firstly, don't love that this happens. But the fact you're not willing to provide tolerance to a commercial tool that costs maybe a few hundred bucks a month but are willing to do so for a human who probably costs thousands of bucks a month is revealing of a double standard we're all navigating.

Its like the fallout when a waymo kills a "beloved neighborhood cat". I'm not against cats, and I'm deeply saddened at the loss of any life, but if it's true that (comparable) mile for mile, waymos reduce deaths and injuries, that is a good thing - even if they don't reduce them to zero.

And to be clear, I often feel the same way - but I am wondering why and whether it's appropriate!

2 comments

PeterFBell

prmph 2 months ago

For me I was just pointing out some interesting and noteworthy failure modes.

And it matters. If the models struggle sometimes with basic instruction following, they're can quite possibly make insidious mistakes in large complex tasks that you might no have the wherewithal or time to review.

The thing about good abstractions is that you should be able to trust in a composable way. The simpler or more low-level the building blocks, the more reliable you should expect them to be. In LLMs you can't really make this assumption.

ewoodrich 2 months ago

I mean, we typically architect systems depending on humans around an assumption of human fallibility. But when it comes to automation, randomly still doing the exact opposite even if somewhat rare is problematic and limits where and at what scale it can be safely deployed without needing ongoing human supervision.

For a coding tool it’s not as problematic as hopefully you vet the output to some degree but it still means I have don’t feel comfortable using them using them as expansively (like the mythical personal assistant doing my banking and replying to emails, etc) as they might otherwise be used with more predictable failure modes.

I’m perfectly comfortable with Waymo on the other hand, but that would probably change if I knew they were driven by even the newest and fanciest LLMs as [toddler identified | action: avoid toddler] -> turns towards toddler is a fundamentally different sort of problem.