← Back to context

Comment by zambelli

17 days ago

Thanks for the thoughtful comment! Let me try to unpack some of what's there and what's missing.

Forge is at its core a mechanical reliability layer, whereas a lot of memory/skill management would be more of an orchestration component/element that that consumer would own.

That split that has forge stopping at the mechanical layer was an intentional design decision, but there's no reason it couldn't grow into more. I think a lot of what you're thinking about is a big model/small model split similar to how CC does it - but that's an orchestrator.

Now, where Forge can help with what you're suggesting - I think most of it is there, but needs some wiring from the consumer/orchestrator: - Forge surfaces information about which guardrails fired: InferenceResult.new_messages carries typed MessageMeta.type — RETRY_NUDGE, STEP_NUDGE, PREREQUISITE_NUDGE, CONTEXT_WARNING, SUMMARY. So every nudge that fired during a run is observable per-step. A consumer could capture that and compare to workflow steps to reconstruct what success looked like. - Combined with Guardrails.check() > CheckResult, you would have a lot of the journey the model took to get to the answer. - Forge lets you (actually, requires) you to define the system prompt, any workflow restrictions, and the tools. So if you know something about how your task will behave with a small model, you can include that in system prompt, or a tool that's a required step, etc.

For integrations into MCPs/etc that house memories and skills, those can be surfaced to the model with Forge in place. Prompt the model to search for tools in the MCP/surface an MCP tool, etc. I've built a consumer that follows this pattern: main agent gets task > main agent eyeballs whether it can be solved on its own > if not, sends to a subagent specialized on that topic (that has access to more tools related to that) - which allows me to keep context lean for each agent.

You could do something similar where the model is prompted to use its toolset, but if its unsure or needs a tool it doesn't have, to call the get_mcp() tool or something to look for better options.

Big model v small model now - a couple of ways I think about it. - You could use big models to go through your workflow a few times, see common patterns, and then use those to define prerequisite and required steps in Forge guardrails when using small models. - You could use small models the same way there's the ANTHROPIC_SMALL_FAST_MODEL env var in claude code (this is what Explore subagent uses I think). Big model is effectively an orchestrator, and when it recognizes a task is easy, it dispatches a small model to do it, where Forge might make it viable.

Hoepfully that helps! Forge could certainly elevate some of this to be more native - and I might do that - like a mode that packages up results for you so you don't need to reconstruct the nudge events from hooks firing. But everything should be there to integrate with a memory system with the information required, or with an API/MCP that has more tools or skills for the agent to read.

Would love to see the integration if you do it! You'd just need a consumer that captures the events forge returns and packages them up into whatever your memory system is looking for!

If you're looking for other ways of ingesting those memories/skills that isn't system prompt, message, or tool result, then that's something I can look into.