← Back to context

Comment by Leynos

14 hours ago

Use evals

Coming soon, unit, behavioural and regression tests for your prompts and skills :P

How do you use evals when you’re using Claude Code, given that Claude Code also changes their prompts all the time?

You’ll have:

* Claude model version

* Claude Code prompts and tools

* Your own prompts and skills and whatnot

* Your repository’s source code (= the input)

All of those change constantly, it’s not like it’s some kind of SWE benchmark.