← Back to context

Comment by andai

1 day ago

Could you elaborate on the multi level LLM workflow? Did you set up a benchmark, and you're having a LLM mutate prompts?

But yes, the runbook in the project gives the llm instructions on how to use the scripts and what to modify. I let Claude code read that and tell it to work on a province. It runs a small segment and analyzes the results until it hits 1-2% error with no systemic errors. If it can't get all the errors out then I have it switch to using gemini-flash-lite-latest instead of 2.5, which costs slightly more but performs much better. Basically Claude code runs a self governing loop with my oversight mutating the prompts and data inputs to extract all the names.

EDIT My instructions to the supervising LLM are in here https://github.com/metiscus/roman-names/blob/feature/webapp-...