Comment by cogman10

16 hours ago

> However, I think that they’re really worried about is that a person needs to design and implement that stuff… It throws a wet blanket on their insistence that this will replace entire people in entire workflows or even projects, and I just don’t buy it.

I think you are on to something. But I also think this sort of system lends itself to not needing really good LLMs to do impressive things. I've noticed that the quality of a lot of these LLMs just gets worse the more datapoints they need to track. But, if you break it up into smaller and easier to consume chunks all the sudden you need a much less capable LLM to get results comparable or better than the SOTA.

Why pay extra money for Opus 4.7 when you could run Qwen 3.6 35b for free and get similar results?

5 comments

cogman10

devin 11 hours ago

And then you realize that what you’re using the smaller models for is ALSO decomposable and part of it is just a few if statements, and then you realize that for this feature you don’t actually need or want a model because the performance, reliability, reproducibility are cheaper and better for you and your users.

jimbokun 10 hours ago

So you have the model write the if statements and put itself out of a job.

aleqs 8 hours ago

Indeed, I've been experimenting with agent workflows, for complicated tasks - where I essentially have a graph of agents with different roles/capabilities, including such things as breaking down complex tasks into simpler ones. There seems to be a point where a complex enough task is better performed by a group of cheaper agents/models than by one agent using one of the SOTA big models, in terms of both quality and cost.

zozbot234 23 minutes ago

The big SOTA models win in world knowledge, that's what all those parameters are for. But a huge fraction of agentic tasks is going to be plain clerical work that needs no special knowledge at all, a much simpler model can do them in a straightforward way.

tempest_ 14 hours ago

It is also interesting because you get people with very different use cases arguing about the effectiveness of various models but doing very different things with them.

Its one things for a model to be very clearly instructed to add a REST endpoint to an existing Django app and add a button connected to it on the front vs "Design me a youtube". The smaller models can pretty dependably do the first and fall flat on the second.