Comment by zambelli

17 days ago

Thanks for the thoughtful comment! Let me try to unpack some of what's there and what's missing.

Forge is at its core a mechanical reliability layer, whereas a lot of memory/skill management would be more of an orchestration component/element that that consumer would own.

That split that has forge stopping at the mechanical layer was an intentional design decision, but there's no reason it couldn't grow into more. I think a lot of what you're thinking about is a big model/small model split similar to how CC does it - but that's an orchestrator.

Now, where Forge can help with what you're suggesting - I think most of it is there, but needs some wiring from the consumer/orchestrator: - Forge surfaces information about which guardrails fired: InferenceResult.new_messages carries typed MessageMeta.type — RETRY_NUDGE, STEP_NUDGE, PREREQUISITE_NUDGE, CONTEXT_WARNING, SUMMARY. So every nudge that fired during a run is observable per-step. A consumer could capture that and compare to workflow steps to reconstruct what success looked like. - Combined with Guardrails.check() > CheckResult, you would have a lot of the journey the model took to get to the answer. - Forge lets you (actually, requires) you to define the system prompt, any workflow restrictions, and the tools. So if you know something about how your task will behave with a small model, you can include that in system prompt, or a tool that's a required step, etc.

For integrations into MCPs/etc that house memories and skills, those can be surfaced to the model with Forge in place. Prompt the model to search for tools in the MCP/surface an MCP tool, etc. I've built a consumer that follows this pattern: main agent gets task > main agent eyeballs whether it can be solved on its own > if not, sends to a subagent specialized on that topic (that has access to more tools related to that) - which allows me to keep context lean for each agent.

You could do something similar where the model is prompted to use its toolset, but if its unsure or needs a tool it doesn't have, to call the get_mcp() tool or something to look for better options.

Big model v small model now - a couple of ways I think about it. - You could use big models to go through your workflow a few times, see common patterns, and then use those to define prerequisite and required steps in Forge guardrails when using small models. - You could use small models the same way there's the ANTHROPIC_SMALL_FAST_MODEL env var in claude code (this is what Explore subagent uses I think). Big model is effectively an orchestrator, and when it recognizes a task is easy, it dispatches a small model to do it, where Forge might make it viable.

Hoepfully that helps! Forge could certainly elevate some of this to be more native - and I might do that - like a mode that packages up results for you so you don't need to reconstruct the nudge events from hooks firing. But everything should be there to integrate with a memory system with the information required, or with an API/MCP that has more tools or skills for the agent to read.

Would love to see the integration if you do it! You'd just need a consumer that captures the events forge returns and packages them up into whatever your memory system is looking for!

If you're looking for other ways of ingesting those memories/skills that isn't system prompt, message, or tool result, then that's something I can look into.