Comment by everforward

6 hours ago

Fable can delegate tasks to Opus or Sonnet, so it has some agentic properties and I believe it does them in parallel.

The parallelism is where this starts to fall apart on a local PC. Like I can run some Qwen quants, but I can’t run a decent Qwen model while also running another model smart enough to actually implement it. I’d have to do them in series, and given how long Fable seems to take even with parallelism, I’d probably be waiting days for an answer.

2 comments

everforward

trollbridge 6 hours ago

oh-my-pi can delegate tasks to other models too. I usually use DS4 Flash for low priority subagent tasks.

If Fable is "delegating" tasks, then there's actually an agent front end of whatever you think the API is.

We have a local instance of Qwen-3.6 which is more than adequate for running agents. You can mix and match local and cloud-hosted models. (My biggest use case for local models right now is vision models because they're quite small and I can avoid some data-locality issues my customers wouldn't be comfortable with if I sen them to a Chinese model.)

everforward 2 hours ago

> If Fable is "delegating" tasks, then there's actually an agent front end of whatever you think the API is.
I would say behind (I believe you use the API just like you do Opus), but yeah. I'm not claiming it's a property of the LLM itself, I also presume this is some variety of tool calling agent harness.
> We have a local instance of Qwen-3.6 which is more than adequate for running agents. You can mix and match local and cloud-hosted models.
I'm presuming OP meant local as in the models run locally as well. I do know you can do subagents in Pi (probably others too), but the vast majority of people are going to hit hardware limitations trying to run them in parallel on local hardware.
I'm doubtful Fable's harness is unique in some way that you can't replicate with Pi. I'm mostly doubtful there are more than a handful of people with hardware sitting in their house that can execute more than one meaningfully smart model at a time.
If you're on local hardware, Deepseek v4 Flash is in the ballpark of 180GB of VRAM alone. Even on smaller models, Qwen + a dumber agent to execute is probably in the realm of 60GB of VRAM.
I do suspect you could get Deepseek to do Fable level things with a good harness (or a bunch of models really, I'm fairly convinced the magic of Fable is in the harness rather than the model).