Comment by Leynos 16 hours ago Use evalsComing soon, unit, behavioural and regression tests for your prompts and skills :P 1 comment Leynos Reply stingraycharles 16 hours ago How do you use evals when you’re using Claude Code, given that Claude Code also changes their prompts all the time?You’ll have:* Claude model version* Claude Code prompts and tools* Your own prompts and skills and whatnot* Your repository’s source code (= the input)All of those change constantly, it’s not like it’s some kind of SWE benchmark.
stingraycharles 16 hours ago How do you use evals when you’re using Claude Code, given that Claude Code also changes their prompts all the time?You’ll have:* Claude model version* Claude Code prompts and tools* Your own prompts and skills and whatnot* Your repository’s source code (= the input)All of those change constantly, it’s not like it’s some kind of SWE benchmark.
How do you use evals when you’re using Claude Code, given that Claude Code also changes their prompts all the time?
You’ll have:
* Claude model version
* Claude Code prompts and tools
* Your own prompts and skills and whatnot
* Your repository’s source code (= the input)
All of those change constantly, it’s not like it’s some kind of SWE benchmark.