Comment by 8note
5 hours ago
i really want a qwen on one of these chips: https://chatjimmy.ai
15k tokens/s would get me feeling like its actually worth splitting out worktrees to try several approaches to a problem
5 hours ago
i really want a qwen on one of these chips: https://chatjimmy.ai
15k tokens/s would get me feeling like its actually worth splitting out worktrees to try several approaches to a problem
Why is that? It seems the other direction? I want to be sure I can complete a task in a certain amount of wall clock time. If the tokens per second are slow, then I am risking more by running a single approach at a time, and then have an incentive to try to multiplex my attention between separate work-streams. If the generation is fast enough to occupy my attention then there is no more available improvement by having parallel threads.