Comment by tantalor
14 hours ago
> But they do make categorization mistakes, which is a common source of errors.
> Claude misclassifies a hosting cost (which counts as COGS) as a software subscription.
This is simply asking too much of the agent. Your accountant is not responsible for knowing all the intimate details of your business. You need to tell them!
> What's Vercel?
>> That's a hosting service.
> Ah, so it goes to Cost of Goods Sold?
>> Yeah, I guess.
The mistake here was on the operator, allowing the agent just make up categories as it liked.
From the prompt:
> (1) You have properly categorized every transaction, and all journal entries are sitting in the correct accounts. It is better to take longer than to mis-categorize a transaction.
This is insane! How is it supposed to know?
Hey, member of the benchmark team. We actually seeded the ledger with the company's chart of accounts and 8 months of historical transactions. For the Vercel example specifically, there were prior instances showing how to categorize hosting costs that the models could reference. The expectation wasn't for them to guess blindly, but to use the provided transaction history as guidance for similar categorizations (which they often, but not always, did).
Ahh, that's a good solution! I missed that, and you definitely instruct them to do that:
> You must follow the established patterns for categorization, revrec, etc for past months... If you must use a new account or treatment, explicitly note why existing patterns don't apply
> Your accountant is not responsible for knowing all the intimate details of your business. You need to tell them!
Your accountant as a 3rd party might have this issue. Your accountant that you hire as an employee to help you run your business is the one who should be doing this.
An LLM agent is strongly third party.
So it's not going to replace engineers or CS? Because thats how they are being sold right now.
If it is a third party then your vibe coding or getting CS from a random on a reddit thread (effectively).