Comment by zulgin
5 days ago
I think you are broadly correct, but just to pushback on a few points: (1) Ability to solve hard problems in days vs weeks as immense value (2) Back-end improvements (if done right), should improve platform speed, stability, scalability etc. which should have revenue implication (3) Ability to on-board a SWE equivalent entity in minutes, have them work on a specific hard problem and then off-board them in minutes can have value
All of the above, of course, depends upon Fable consistently being a 2x-3x SWE at minimum.
You're not really solving problems, you're retrieving the best match of solved problems from compressed corpus. And that corpus is available to many companies, meaning "hard" problems stop having "hard problem" value the moment they enter the weights of any model via the internet ... or distill from one model to another. Anthropics business model is commoditising knowledge, but as we see with the Fable model card, they only want it done to the knowledge of other businesses, in their own field, they totally hate it.
I don’t think that’s an accurate or useful characterization of modern AI like Claude at all. It is not simply regurgitating knowledge. It applies its knowledge to create bespoke solutions to the problem you pose to it, and is able to self evaluate its progress towards the completion criteria. If you don’t think that counts as “problem solving”, your definition would exclude nearly all knowledge work and engineering.
People underestimate the vastness of training data (internet) and overestimate their ability to recognize if something is really bespoke. Not to say the no problem solving is happening, because there are many problems that we inefficiently solve again and again and the LLMs are making the solutions more accessible to everyone with a subscription.
> It applies its knowledge to create bespoke solutions to the problem you pose to it, and is able to self evaluate its progress towards the completion criteria.
It imitates applying knowledge. The imitation may be uncanny, but assigning LLMs intentionality and ToM is a category error.
2 replies →
It’s like saying you can’t make a unique sentence unless you first make unique words
> You're not really solving problems, you're retrieving the best match of solved problems from compressed corpus.
This is not correct. LLMs interpolate in a high dimensional space, so you're actually composing the best matches in a compressed corpus to find novel points/paths in that space. That is problem solving.
> Back-end improvements (if done right), should improve platform speed, stability, scalability etc. which should have revenue implication
Depends entirely on the domain. If you're selling entreprise software, this kind of stuff barely matters for sales.
It can reduce operational costs which is good but there's a limit to how much that's worth.
Yep, there are many, many, non-niche domains in which this doesn’t mean much at all.
The thing about AI-generated “solutions” is that they often go down bad rabbit holes and need to be re-run, or since they are so “cheap” to create they are often just thrown away and rebuilt when requirements evolve. Plus, just more stuff is created and needs to be maintained. So in the end, your efficiency gains go out the window.
In my experience, the challenge in software development is not to solve a problem, but to define the outcome, the scope, the acceptance criteria etc.
Exactly, this is the hardest part and the reason why many projects fail
20x the cost means you need to have fable to be 20x better than the alternative, which is a tall order. And there's more options out there too, perhaps the 4x cost is enough.
This means if the deepseek / under 1k alternative is at least x1.2 improvement, fable needs to be x24, which I think is very2 unreasonable. It is possible for it to worth if it can x2 a $20k SWE, though I doubt it can do that.
“Ability to solve hard problems in days vs weeks as immense value”. Citation needed.
LlMs are incredible don’t get me wrong, but they are good on tiny contexts (writing a script). Not on software engineering (adding features to Chrome).
Honestly, LLMs been OK at adding features to software since around Opus 4.5. From what I've tried of Fable, it's a decent step up from the Opus models and I can only see things getting better.
>pushback on a few points
Claude keeps telling me this when I argue with it. LMAO.
“gently push back”