Comment by simgt

1 month ago

> I replicated David Ng's RYS method [...] found something I didn't expect.

> Transformers appear to have discrete "reasoning circuits" — contiguous blocks of 3-4 layers that act as indivisible cognitive units. Duplicate the right block and the model runs its reasoning pipeline twice. No weights change. No training. The model just thinks longer.

How did you not expect that if you read his post? That's literally what he discovered, two years ago.

For anyone interested, there's more meat in the post and comments from last week: https://news.ycombinator.com/item?id=47322887

7 comments

simgt

regularfry 1 month ago

That's explicitly not the unexpected part. Read the rest of the post.

yorwba 1 month ago
After reading both the original post and this submission, what do you think is new here?
- regularfry 1 month ago
  
  > The weird part: different duplication patterns create different cognitive "modes" from the same weights. Double-pass boosts math. Triple-pass boosts emotional reasoning. Interleaved doubling (13,13,14,14,15,15,16) creates a pure math specialist. Same model, same VRAM, different routing.
  As far as I can see that's not implied by the original post.
  But that's beside the point: quoting the bit where the poster says "here's what I'm building on top of" and using that to imply they haven't done anything new is a bit pointless, no?
  
  3 replies →
- jstanley 1 month ago
  
  It's all new to me.