Comment by troupo

8 hours ago

> because it has business context

It doesn't because it doesn't learn. Every time you run it, it's a new dawn with no knowledge of your business or your business context

> better reasoning

It doesn't have better reasoning beyond very localized decisions.

> and can ask humans for clarification and take direction.

And yet it doesn't, no matter how many .md file you throw at it, at crucial places in code.

> We have clear scaling laws on true statistical performance that is monotonically related to any notion of what performance means.

This is just a bunch of words stringed together, isn't it?

7 comments

troupo

aspenmartin 8 hours ago

> It doesn't because it doesn't learn. Every time you run it, it's a new dawn with no knowledge of your business or your business context

It does learn in context. And lack of continuous learning is temporary, that is a quirk of the current stack, expect this to change rather quickly. Also still not relevant, consider that agentic systems can be hierarchical and that they have no trouble being able to grok codebases or do internal searches effectively and this will only improve.

> It doesn't have better reasoning beyond very localized decisions.

Do you have any basis for this claim? It contradicts a large amount of direct evidence and measurement and theory.

> This is just a bunch of words stringed together, isn't it?

Maybe to yourself? Chinchilla scaling laws and RL scaling laws are measured very accurately based on next token test loss (Chinchilla). This scales very predictably. It is related to downstream performance, but that relationship is noisy but clearly monotonic

troupo 7 hours ago
> It does learn in context
It quite literally doesn't.
It also doesn't help that every new context is a new dawn with no knowledge if things past.
> Also still not relevant, consider that agentic systems can be hierarchical and that they have no trouble being able
A bunch of Memento guys directing a bunch of other Memento guys don't make a robust system, or a system that learns, or a system that maintains and retains things like business context.
> and this will only improve.
We've heard this mantra for quite some time now.
> Do you have any basis for this claim?
Oh. Just the fact that in every single coding session even on a small 20kloc codebase I need to spend time cleaning up large amounts of duplicated code, undo quite a few wrong assumptions, and correct the agent when it goes on wild tangents and goose hunts.
> Maybe to yourself? Chinchilla scaling laws a
yap yap yap. The result is anything but your rosy description of these amazing reasoning learning systems that handle business context.
- aspenmartin 7 hours ago
  
  > It quite literally doesn't.
  Awesome you've backed this up with real literature. Let's just include this for now to easily refute your argument which I don't know where it comes from: https://transformer-circuits.pub/2022/in-context-learning-an...
  > It also doesn't help that every new context is a new dawn with no knowledge if things past.
  Absolutely true that it doesn't help but: agents like Claude have access to older sessions, they can grok impressive amounts of data via tool use, they can compose agents into hierarchical systems that effectively have much larger context lengths at the expense of cost and coordination which needs improvement. Again this is a temporary and already partially solved limitation
  > A bunch of Memento guys directing a bunch of other Memento guys don't make a robust system, or a system that learns, or a system that maintains and retains things like business context.
  I think you are not understanding: hierarchical agents have long term memory maintained by higher level agents in the hierarchy, it's the whole point. It's annoying to reset model context, but yet you have a knowledge base of the business context persisted and it can grok it...
  > We've heard this mantra for quite some time now.
  yes you have, and it has held true and will continue to hold true. Have you read the literature on scaling laws? Do you follow benchmark progression? Do you know how RL works? If you do I don't think you will have this opinion.
  > yap yap yap. The result is anything but your rosy description of these amazing reasoning learning systems that handle business context.
  Well that's fine to call an entire body of literature "yap" but don't pretend like you have some intelligible argument, I don't see you backing up any argument you have here with any evidence, unlike the multitude of sources I have provided to you.
  Do you argue things have not improved in the last year with reasoning systems? If so I would really love to hear the evidence for this.
  
  2 replies →

skydhash 8 hours ago

Almost every task that people are tackling agents on, it’s either not worth doing, can be done better with scripts and software, or require human oversight (that negates all the advantages.

aspenmartin 7 hours ago

I assume this is a troll because it's just so far removed from reality there's not much to say. "Almost every task" -- I'm sure you have great data to back this up. "It's not worth doing" well sure if you want to put your head in the sand and ignore even what systems today can do let alone the improvement trajectory. "can be done better with scripts and software" .... not sure if you realize this but agents write scripts and software. "or require human oversight (that negates all the advantages." it certainly does not; human oversight vs actual humans implementing the code is pretty dramatically more efficient and productive.