Comment by prng2021

1 day ago

How is anyone predicting timelines for AGI when these systems can’t do basic addition of 2 arbitrary numbers with 100% accuracy?

6 comments

prng2021

famouswaffles 1 day ago

Can you do basic addition of 2 arbitrary numbers with 100% accuracy (no tools) ? No you can't. You will make mistakes for a sufficiently large N even with pen and paper, and a very small N without. Are you no longer generally intelligent ?

undersuit 14 hours ago

Somewhere along the line a $10000 GPU has to be equivalent to using a finger to do arithmetic in the dust.
sambapa 1 day ago

No, but I can develop methods to eventually do it.

wmf 1 day ago

LLMs should use tool calling (which is 100% reliable) instead of doing math internally. But in general it would be nice to be able to teach a process and have the AI execute it deterministically. In some sense, reliability between 99% and 100% is the worst because you still can't trust the output but the verification feels like wasted effort. Maybe code gen and execution will get us there.

base76 18 hours ago

This is the exact problem CognOS was built to solve.

  99% reliable means you still can't remove the human from the loop — because you never know which 1% you're in. The only way to actually trust output is to attach a verifiable confidence   
  signal to each response, not just hope the aggregate accuracy holds.                                                                                                                        
                                                                                                                                                                                            
  We built a local gateway that wraps every LLM output with a trust envelope: decision trace, risk score, and an explicit PASS/REFINE/ESCALATE/BLOCK classification. The point isn't to make 
  LLMs more accurate — it's to make their uncertainty legible so the human knows when to step in.

  Open source if you want to look at the architecture: github.com/base76-research-lab/operational-cognos

base76 19 hours ago

"reliability between 99% and 100% is the worst because you still can't trust the output"