Comment by yorwba

1 month ago

After reading both the original post and this submission, what do you think is new here?

5 comments

yorwba

> The weird part: different duplication patterns create different cognitive "modes" from the same weights. Double-pass boosts math. Triple-pass boosts emotional reasoning. Interleaved doubling (13,13,14,14,15,15,16) creates a pure math specialist. Same model, same VRAM, different routing.

As far as I can see that's not implied by the original post.

But that's beside the point: quoting the bit where the poster says "here's what I'm building on top of" and using that to imply they haven't done anything new is a bit pointless, no?

simgt 1 month ago

You're right that my quote was misleading, I overlooked "the weird part" in the post because it didn't seem new to me either.
Here's the section in the original post that covers it: https://dnhkng.github.io/posts/rys/#the-brain-scanner All heatmaps are split by tasks and show an optimal point for each. The resulting routing he chose is a trade-off for both tasks, there isn't much else to do unless you intend to train a router anyway.
> So the ‘math organ’ has boundaries on both sides. Too few layers and you get nothing — you’ve cut into the circuit and it can’t complete its operation. Too many layers and you also get nothing — you’ve included tissue from a neighbouring circuit that doesn’t belong. Pre-training carved these structures out of the layer stack, and they only work whole. It also doesn’t translate to other tasks, as the heatmap for EQ scores doesn’t have this patch.

gavinray 1 month ago

This is stated in the original post as well, under "The Beginning of LLM Neuroanatomy?" section:

  > From end-position 43 to 46, we then see solid boosts in math scores (red = good, yay). But include layer 46 or beyond, and the benefits collapse again. The hypothesis: position 47 is where a different circuit begins. Including even one step of the next recipe messes up the current recipe.

  > So the ‘math organ’ has boundaries on both sides. Too few layers and you get nothing — you’ve cut into the circuit and it can’t complete its operation. Too many layers and you also get nothing — you’ve included tissue from a neighbouring circuit that doesn’t belong. Pre-training carved these structures out of the layer stack, and they only work whole. It also doesn’t translate to other tasks, as the heatmap for EQ scores doesn’t have this patch.

  > This is a much more specific claim than “middle layers do reasoning.” It’s saying the reasoning cortex is organised into functional circuits: coherent multi-layer units that perform complete cognitive operations. Each circuit is an indivisible processing unit, and the sweeps seen in the heatmap is essentially discovering the boundaries of these circuits.

regularfry 1 month ago

That's just saying there are circuits. It's not saying you get different effects by stacking the same circuit in different ways.

jstanley 1 month ago

It's all new to me.