Comment by jedwhite

1 month ago

In practice after using this for real-world test suites and evaluations, the results with Claude Code if you do this sensibly are remarkably consistent. That's because you can still write the deterministic parts as the `./run_tests.sh` bash script (or `run_tests.py` etc).

So you're using the appropriate tools for the task at hand embedded within both traditional scripts and markdown scripts.

Examples: - A bash script summarizes text files from a path in a loop - A markdown script runs `./test/run_tests.py` and summarizes the results.

Tools like Claude code combined with executable scripts and pipes open up a genuinely new way of doing tasks that are traditionally hard with scripting languages alone. I expect we will see a mix of borth approaches where each gets used based on its strengths, as we're seeing with application development too.

It is a new world and we're all figuring this out.

[Edit for style]

2 comments

jedwhite

akdev1l 1 month ago

I mean in such case it is equivalent to like `do-something | llm “summarize the thing”`

Personally I see “prompt scripting” as strictly worse than code

cannot even modify some part of the prompt without being sure that there won’t be random side effects

And from what I’ve seen these prompts can(and do tend to) grow into possibly hundreds of lines as they become more specific and people try to “patch” the edge cases.

It ends up being like code but strictly worse.

jedwhite 1 month ago

One of the advantages of using executable Markdown files with pipe support is that it allows you to create composable building blocks that can be chained together.
So you can build individual prompt-based scripts (format.md, summarize.md etc.) that are each small, simple and focused on a single task. Then you can chain those prompt scripts together with regular command line tools and bash scripts.
I find that approach quite powerful, and it helps overcome the need for massive prompts. They can also be invoked from within Claude Code in interactive mode.