← Back to context

Comment by pnocera

1 month ago

I've been playing with Brad Ross's AISP [1] to get a better quality of llm outputs at strategic stages of our basic design / plan / implementation workflows.

A concrete example of this is our Adviser Skill experiment [2]. In most AI workflows, a "reviewer" agent just dumps markdown feedback. Our Adviser doesn't just "talk"; it outputs an AISP 5.1 document ( a kind of "Assembly Language for AI Cognition" )

This document forces the agent to define:

- Strict Type Definitions for the issues identified (e.g., distinguishing between a gap, an edge case, or a missing requirement).

- EARS Rules (Easy Approach to Requirements Syntax) that determine the verdict. For example, a rule might state: "If any issue has a severity of ⊘ (critical), then the workflow MUST halt."

- Formal Evidence: Every "approve" or "reject" verdict must include a confidence score (δ) and a grounding proof (π) that explains why the change matches the original specification.

By treating the agent's output as a proof-carrying protocol rather than just text, we can chain multiple specialized agents (Architect, Strategist, Auditor) who "triangulate" on the codebase. They must reach a formal consensus where the variance between their scores is low.

This shifts the agent's goal from "Finish the task at all costs" to "Prove that this change is safe and correct." It turns out that iterating on the verification logic is much more effective for building reliable systems than just increasing the number of agents running concurrently.

[1] Brad Ross AISP : https://github.com/bar181/aisp-open-core

[2] Adviser skill : https://github.com/pnocera/skilld