← Back to context

Comment by tpmoney

4 hours ago

> I am talking about reproducing the (perhaps erroneous) logic or thinking or motivations in cases of bugs

But "to what purpose" is where this all loses me. What do you gain from seeing what was said to the AI that generated the bug? To me it feels like these sorts of things will fall into 3 broad categories:

1) Underspecified design requirements

2) General design bugs arising from unconsidered edge cases

3) AI gone off the rails failures

For items in category 1, these are failures you already know how to diagnose with human developers and your design docs should already be recorded and preserved as part of your development lifecycle and you should be feeding those same human readable design documents to the AI. The session output here seems irrelevant to me as you have the input and you have the output and everything in between is not reproducible with an AI. At best, if you preserve the history you can possibly get a "why" answer out of it in the same way that you might ask a dev "why did you interpret A to mean B", but you're preserving an awful lot of noise and useless data int the hopes that the AI dropped something in it's output that shows you someplace your spec isn't specific or detailed enough that a simple human review of the spec wouldn't catch anyway once the bug is known.

For category 2, again this is no different from the human operator case and there's no value that I can see in confirming in the logs that the AI definitely didn't consider this edge case (or even did consider it and rejected it for some erroneous reason). AI models in the forms that folks are using them right now are not (yet? ever?) capable of learning from a post mortem discussion about something like that to improve their behavior going forward. And its not even clear to me that even if they were, you would need the output of the session as opposed to just telling the robot "hey at line 354 in foo.bar you assumed that A would never be possible, but no place in the code before that point asserts it, so in the future you should always check for the possibility of A because our system can't guarantee it will never occur."

And as for category 3, since it's going off the rails, the only real thing to learn is whether you need a new model entirely or if it was a random fluke, but since you have the inputs used and you know they're "correct", I don't see what the session gives you here either. To validate whether you need a new model, it seems that just feeding your input again and seeing if you get a similar "off the rails" result is sufficient. And if you don't get another "off the rails" result, I sincerely doubt your model is going to be capable of adequately diagnosing its own internal state to sort out why you got that result 3 months ago.