Comment by shanktt

12 hours ago

Hey, member of the benchmark team. We actually seeded the ledger with the company's chart of accounts and 8 months of historical transactions. For the Vercel example specifically, there were prior instances showing how to categorize hosting costs that the models could reference. The expectation wasn't for them to guess blindly, but to use the provided transaction history as guidance for similar categorizations (which they often, but not always, did).

Ahh, that's a good solution! I missed that, and you definitely instruct them to do that:

> You must follow the established patterns for categorization, revrec, etc for past months... If you must use a new account or treatment, explicitly note why existing patterns don't apply