Comment by typpo

7 hours ago

Lately my company has been doing a lot of complex accounting and reporting in spreadsheets. Overall was surprised by how well both GPT and Claude handled some of these extremely tedious tasks. Not uncommon to have an hours-long task compressed to minutes.

My anecdotal experience is GPT 5.2 Pro is decently ahead of Claude Opus 4.5 in this category when it gets to the tricky stuff, both in presentation and accuracy. The long reasoning seems to help a lot. But, apparently the benchmarks do not agree.

Edit - noticed OpenAI specifically focuses on finance use cases in their gpt-5.3-codex blog as well https://openai.com/index/introducing-gpt-5-3-codex/

I feel like I'd be really skeptical of results from a non-deterministic model for something as precise as accounting....

  • The deterministic part (calculations) is done by Excel.

    The non-deterministic part is turning human instructions ("calculate the NPV over 10 years for X given Y") into Excel.

    This is already a non-deterministic process (humans are non-deterministic!). The question is if an AI model can be more reliable than humans, and I can't see any reason why it wouldn't be.

    The correct path is pretty clear, so the logits for following that path are going to be a long way from off-path.

    For something like this the real problem is training the model to use Excel (which will show up by it being confused which sheet it is on or trying to use the wrong window or things like that), not the non-determinism.

Dont use Excel for accounting....