Comment by wmf
21 hours ago
LLMs should use tool calling (which is 100% reliable) instead of doing math internally. But in general it would be nice to be able to teach a process and have the AI execute it deterministically. In some sense, reliability between 99% and 100% is the worst because you still can't trust the output but the verification feels like wasted effort. Maybe code gen and execution will get us there.
This is the exact problem CognOS was built to solve.
"reliability between 99% and 100% is the worst because you still can't trust the output"