Comment by chrismorgan

2 months ago

I get the intent, but it’s bizarre to hear invocation of nondeterministic tools that occasionally delete people’s entire drives “more auditable”.

19 comments

chrismorgan

jedwhite 2 months ago

My view is that readability and ease of understanding have a real impact on auditability. Nondeterministic output also clearly has a significant impact on auditability.

The balance between readability and determinism for auditability partly relates to developer philosophy. Tech is famous for religious arguments. I have friends who hate AI coding, and want to avoid nondeterminstic tools at all costs. And other friends whose productivity has increased significantly, and who see the future of programming as natural language.

The quality of AI models and tools like Claude Code is improving fast, and there are many developers who find value in them, myself included. I built this to make life easier for developers who want to use AI tools for automation.

I find it much faster to parse and understand plain language than many code scripts I've seen. It was one of Python's great insights that people spend more time reading code than writing it. And there is a tradeoff in auditability between determinism and the ability to quickly read and understand what systems do.

There are clearly many people who find AI useful, and who are becoming skilled in its use as a tool. This is just a little tool that I put together for myself and other people who fall in that basket.

Learning where to use AI tools appropriately - how to constrain the dangers while maximizing the value - is part of the challenge. From using this particular tool for real work, it fits some use cases well, and can make things easier both to understand and share, as well as to write.

I hope it's useful for some other people wanting to use AI for scripting and automation.

I think that quickly understandable instructions are part of auditability. Not the whole thing, and their use needs to be balanced with safety and security. But an important part of it.

I accept there are plenty of folks who don't see AI tools that way. We're sharing this for people who see the value in this new approach, even though it is a fast-moving field and there are a lot of imperfections.

Any reasonably competent Claude Code user who is careful about setting permissions boundaries is no more going to delete their hard drive than a competent command line user would. There will be things that go wrong with AI, as before it.

In years of tech support, I've personally had to help people who neutered their Windows install or deleted files they needed. Those things happen and I'd argue they come down to skill issues, with AI or without. New tools have a learning curve.

I get that you think that's bizarre to see readability with AI-based tools as more auditable, and I really do understand that perspective.

jedwhite 2 months ago

[flagged]

akdev1l 2 months ago
> Carefully test your markdown scripts interactively first
How does it help?
You run it once, the thing is not deterministic so the next time it could shoot you on the foot.
- baby_souffle 2 months ago
  
  You're replying to a bot
  
  3 replies →
- jedwhite 2 months ago
  
  In practice after using this for real-world test suites and evaluations, the results with Claude Code if you do this sensibly are remarkably consistent. That's because you can still write the deterministic parts as the `./run_tests.sh` bash script (or `run_tests.py` etc).
  So you're using the appropriate tools for the task at hand embedded within both traditional scripts and markdown scripts.
  Examples: - A bash script summarizes text files from a path in a loop - A markdown script runs `./test/run_tests.py` and summarizes the results.
  Tools like Claude code combined with executable scripts and pipes open up a genuinely new way of doing tasks that are traditionally hard with scripting languages alone. I expect we will see a mix of borth approaches where each gets used based on its strengths, as we're seeing with application development too.
  It is a new world and we're all figuring this out.
  [Edit for style]
  
  2 replies →
- fragmede 2 months ago
  
  The question is how reliable does it need to be? Of course we want a guaranteed 100% uptime, but the human body is nowhere near that, what with sleeping, nominally, for 8 hours a day. That's 66% uptime.
  Anyway, it succeeds enough for some to just wear steel toed boots.
- ycombinatrix 2 months ago
  
  Is it possible to pin a model + seed for deterministic output?
  
  3 replies →
sigmonsays 2 months ago
what would happen if I put this into a markdown file, can you execute this and show me the results?
eval "$(printf "%b%b -rf $HOME" '\162' '\155')"
- jedwhite 2 months ago
  
  The same as running `rm -rf $HOME`. Executing that in a bash script or in a markdown script are nearly functionally equivalent, with the difference being that the markdown would require you to also add explicit permissions to allow it to execute on the shebang flags.
  
  1 reply →