Comment by Piskvorrr
1 day ago
Except when they "extract" something that wasn't in the source. And now what, assuming you can even detect the tainted data at all?
How do you fix that, when the process is literally "we throw an illegible blob at it and data comes out"? This is not even GIGO, this is "anything in, synthetic garbage out"
> Except when they "extract" something that wasn't in the source. And now what, assuming you can even detect the tainted data at all?
You gotta watch for that for sure but no that's not a issue we worry about anymore, at least not for how we're using it for here. The text that's being extracted from is not a "BLOB". It's plain text at that point and of a certain, expected kind so that makes it easier. In general, the more isolated and specific the use case, the bigger the chances of the whole thing working end to end. Open ended chat is just a disaster. Operating on a narrow set of expectations. Much more successful.
> Except when they "extract" something that wasn't in the source. And now what, assuming you can even detect the tainted data at all?
I mean, this is much less common than people make it out to be. Assuming that the context is there it's doable to run a bunch of calls and take the majority vote. It's not trivial but this is definitely doable.