← Back to context

Comment by simgt

1 month ago

> I replicated David Ng's RYS method [...] found something I didn't expect.

> Transformers appear to have discrete "reasoning circuits" — contiguous blocks of 3-4 layers that act as indivisible cognitive units. Duplicate the right block and the model runs its reasoning pipeline twice. No weights change. No training. The model just thinks longer.

How did you not expect that if you read his post? That's literally what he discovered, two years ago.

For anyone interested, there's more meat in the post and comments from last week: https://news.ycombinator.com/item?id=47322887

That's explicitly not the unexpected part. Read the rest of the post.

  • After reading both the original post and this submission, what do you think is new here?

    • > The weird part: different duplication patterns create different cognitive "modes" from the same weights. Double-pass boosts math. Triple-pass boosts emotional reasoning. Interleaved doubling (13,13,14,14,15,15,16) creates a pure math specialist. Same model, same VRAM, different routing.

      As far as I can see that's not implied by the original post.

      But that's beside the point: quoting the bit where the poster says "here's what I'm building on top of" and using that to imply they haven't done anything new is a bit pointless, no?

      3 replies →