Comment by mark_l_watson

11 hours ago

But, it is not all about cost: models like DeepSeek v4 flash (I use the US company Fireworks.ai and also buy tokens directly from DeepSeek) is very fast, very low latency while working.

Would you want to use a text editor that updates the screen very slowly? Kind of the same thing for using agentic systems as coding assistants: don’t want a ‘sluggish’ experience.

1 comment

mark_l_watson

erispoe 10 hours ago

I have, mostly, long running autonomous tasks, so it doesn't matter how slow inference is. If I optimize for latency it means I'm turning into the limiting factor.