← Back to context

Comment by monkeydust

7 hours ago

I have a MA system setup for personal use.

You give it a problem, you then refine that problem where a fast, cheaper model asks you questions which you answer to get a better input prompt. You then choose a MA strategy for example take problem break up to sections then final judge concludes or you do multi turn where agents debate then judge summarises debate.

The best approach is what I call 'all angles' where all these strategies run in parallel the final meta-judge synthesise the response - the most useful part of this which I recently added is a view to see the variance in each strategy.

Been using this for life stuff - housing search, schools, family challenges!

Perhaps I should make a video of it in action if people in HN community interested let me know.

I have also developed a similar system not focused on the exploratory refinement of prompt(s). But more focused on feedback loops cybernetic style, so focused on the maintaining of stability of the prompt outputs by a growing library of deterministic checks and autofixes. Anything that is a "problem" which isn't covered by that library is surfaced to the human driving the process.

You mention cost in one of the replies. Can you elaborate on the cost profile (ballpark) for various problem types? I would also be curious to understand the strategies employed and what the costs look like across each.

Definitely interested, would love to see a video :)

  • Sure let me do that. Can I post this as a ShowHN if its just video? The rules say people need to try out but that will cost me a small fortune :) ...could perhaps post on Github and people can setup the repo themselves with their own Openrouter key if that works. Have never done a ShowHN but would be fun to try it.

So what harness are you using? And what LLM’s

  • Homebrew harness and all frontier ones plus deepseek. All via Openrouter at the moment. Works well enough but can get expensive so use for real high value challenges. Interestingly the refine feature has been most useful to me and people I have shown, essentially people are lazy when expressing the initial problem (me included!), refine asks relevant questions to initial problem then refines the initial statement, user can accept/reject/edit before submitting.

    • I came to a similar conclusion. I think the default options in many IDEs (Ask/Plan/Agent) are limited... 'Refine' feels like an improved 'Plan' in that it doesn't just jump right into building a list of tasks based on the initial prompt, because who knows what sort of flaws or deficiencies were present in the initial prompt! Can't always get everything right in the first try. XP

      I don't think a specific harness is even necessary to get a boost from 'Refine'. Even a simple custom agent is portable enough... it's easy enough to take the existing 'Plan' agent definition present in VS Code and tweak it to be 'Refine' instead.