Comment by aspenmartin

9 hours ago

> It quite literally doesn't.

Awesome you've backed this up with real literature. Let's just include this for now to easily refute your argument which I don't know where it comes from: https://transformer-circuits.pub/2022/in-context-learning-an...

> It also doesn't help that every new context is a new dawn with no knowledge if things past.

Absolutely true that it doesn't help but: agents like Claude have access to older sessions, they can grok impressive amounts of data via tool use, they can compose agents into hierarchical systems that effectively have much larger context lengths at the expense of cost and coordination which needs improvement. Again this is a temporary and already partially solved limitation

> A bunch of Memento guys directing a bunch of other Memento guys don't make a robust system, or a system that learns, or a system that maintains and retains things like business context.

I think you are not understanding: hierarchical agents have long term memory maintained by higher level agents in the hierarchy, it's the whole point. It's annoying to reset model context, but yet you have a knowledge base of the business context persisted and it can grok it...

> We've heard this mantra for quite some time now.

yes you have, and it has held true and will continue to hold true. Have you read the literature on scaling laws? Do you follow benchmark progression? Do you know how RL works? If you do I don't think you will have this opinion.

> yap yap yap. The result is anything but your rosy description of these amazing reasoning learning systems that handle business context.

Well that's fine to call an entire body of literature "yap" but don't pretend like you have some intelligible argument, I don't see you backing up any argument you have here with any evidence, unlike the multitude of sources I have provided to you.

Do you argue things have not improved in the last year with reasoning systems? If so I would really love to hear the evidence for this.

3 comments

aspenmartin

troupo 8 hours ago

> Let's just include this for now to easily refute your argument which I don't know where it comes from: https://transformer-circuits.pub/2022/in-context-learning-an...

I love it when people include links to papers that refute their words.

So, Antropic (which is heavily reliant on hype and making models appear more than they are) authors a paper which clearly states: "tokens later in context are easier to predict and there's less loss of tokens. For no reason at all we decided to give this a new name, in-context learning".

> agents like Claude have access to older sessions, they can grok impressive amounts of data via tool use

That is they rebuild the world from scratch for every new session, and can't build on what was learned or built in the last one.

Hence continuous repeating failure modes.

10 years ago I worked in a team implementing royalties for a streaming service. I can still give you a bunch of details, including references to multiple national laws, about that. Agents would exhaust their context window just re-"learning" it from scratch, every time. And they would miss a huge amount of important context and business implications.

> Have you read the literature on scaling laws?

You keep referencing this literature as it was Holy Bible. Meanwhile the one you keep referring to, Chinchilla, clearly shows the very hard limits of those laws.

> Do you argue things have not improved in the last year with reasoning systems?

I don't.

Frankly, I find your aggressiveness quite tiring

aspenmartin 8 hours ago
> Frankly, I find your aggressiveness quite tiring
having to answer for opinions with no basis in the literature is I'm sure very tiring for you. Your aggression being met is I'm sure uncomfortable.
> I love it when people include links to papers that refute their words. > So, Antropic (which is heavily reliant on hype and making models appear more than they are) authors a paper which clearly states: "tokens later in context are easier to predict and there's less loss of tokens. For no reason at all we decided to give this a new name, in-context learning".
well I don't really love it when people just totally misread a paper because they have an agenda to push and can't seem to accept that their opinions are contradicted by real evidence.
in-context learning is not "later tokens easier" it’s task adaptation from examples in the prompt. I'm sure you realize this. Models can learn a mapping (e.g. word --> translation) from a few examples in the prompt, apply inputs within the same forward pass. That is function learning at inference time, not just "predicting later tokens better"
I'm sure also you're happy to chalk up any contradicting evidence to a grand conspiracy of all AI companies just gaming benchmarks and that this gaming somehow completely explains progress.
> That is they rebuild the world from scratch for every new session, and can't build on what was learned or built in the last one.
That they rebuild the world from scratch (wrong, they have priors from pretraining, but I accept your point here) does not mean they can't build on what was learned or built in the last one. They have access to the full transcript, and they have access to the full codebase, the diff history, whatever knowledge base is available. It's just disingenuous to say this, and then it also assumes (1) there is no mitigation for this, which I have presented twice before and you don't seem to understand it, (2) this is a temporary limitation, continual learning is one of the most important and well funded problems right now.
> 10 years ago I worked in a team implementing royalties for a streaming service. I can still give you a bunch of details, including references to multiple national laws, about that. Agents would exhaust their context window just re-"learning" it from scratch, every time. And they would miss a huge amount of important context and business implications.
also not an accurate understanding of how agents and their context work; you can use multiple session to digest and distill information useful in other sessions and in fact Claude does this automatically with subagents. It's a problem we have _already sort of solved today_ and that will continue to improve.
> You keep referencing this literature as it was Holy Bible. Meanwhile the one you keep referring to, Chinchilla, clearly shows the very hard limits of those laws.
You keep dismissing this literature as if you have understood it and that your opinion somehow holds more weight...Can you elaborate on why you think Chinchilla shows the hard limits of the scaling laws? Perhaps you're referring to the term capturing the irreducible loss? Is that what you're saying?
> Do you argue things have not improved in the last year with reasoning systems? I don't
Then are you arguing this progress will stop? I'm just not sure I understand, you seem to contradict yourself
- troupo 1 hour ago
  
  > having to answer for opinions with no basis in the literatu
  Having only literature on your side must feel nice.
  > They have access to the full transcript, and they have access to the full codebase, the diff history, whatever knowledge base is available.
  Yes. And it means that they don't learn, and they alway miss important details when rebuilding the world.
  That's why even the tiniest codebases are immediately filled with duplications, architecturally unsound decisions, invalid assumptions etc.
  > also not an accurate understanding of how agents and their context work; you can use multiple session to digest and distill information useful in other sessions and in fact
  I say: agents don't learn and have to rebuild the world from scratch
  You: not an accurate understanding of how agents and their context work.... they rebuild the world from scratch every time they run.
  > You keep dismissing this literature as if you have understood it
  No. I'm dismissing your flawed interpretation of purely theoretical constructs.
  Chinchilla doesn't project unlimited amazing scalability. If anything, it shows a very real end of scalability.
  Anthropic's paper adopts a nice marketable term for a process that has little to do with learning.
  Etc.
  Meanwhile you do keep rejecting actual real-world behaviour of these systems.
  > Then are you arguing this progress will stop? I'm just not sure I understand, you seem to contradict yourself
  I didn't say that either. Your opponents don't contradict themselves if you only stop to pretend they think or say.
  Your unsubstantiated belief is that improvements are on a steep linear or even exponensial progression. Because "literature" or something.
  Looking past all the marketing bullshit, it could be argued that growth is at best logarithmic, and most improvments come from tooling around (harnesses, subagents etc.). While all the failure modes from a year ago are still there: misunderstanding context, inability to maintain cohesion between sessions, context pollution etc.
  And providers are running into the issue of getting non-polluted trainig data.
  ---
  At this point we're going around in circles, and I'm no interested in arguing with theorists.
  Adieu