← Back to context

Comment by hn-acct

1 day ago

How do you quantify or decide an acceptable failure rate for llm output?

Same way as any other production model in ML. Or any field that requires quality control. Really, this is not fundamentally different in conceptual approach than implementing any other technology or area of knowledge which is a near verbatim definition of engineering.

Depends on the failure mode and application. But a first approximation is the same way you would for a human output. E.g. process engineering for a support chatbot has many of the same principles as process engineering for a human staffed call center.