← Back to context

Comment by sometimelurker

21 hours ago

I looked into this "GRAM" stuff a sibling comment links further to, and just to say:

- this gets reinvented/rediscovered constantly under different names

- it cant be trained very well (right now, will change)

- massive theoretical improvements over current models (log_2(vocabsize)=17, residual stream dim is thousands of dimensions, recursivity means more information bandwidth by ~3 OoM)

- BUT it cant be interpreted or aligned <- this is why no one uses it and no one talks about it. the idea is 100% obvious to all the frontier labs and there is a good reason why it isn't used

I follow this stuff closely, I think I know what I'm talking about (edited for formating)

> - this gets reinvented/rediscovered constantly under different names

What are the different names? I haven't seen this before.

> - it cant be trained very well (right now, will change)

If you're sure it will change, then why are you certain that it hasn't yet, and if it's proven a 5000x boost in reasoning... why aren't they exploring this path more aggressively?

> the idea is 100% obvious to all the frontier labs and there is a good reason why it isn't used

Surely someone is willing to take a 5000x boost in reasoning on a small research model... None of them have even tried anything resembling this AFAIK. It does not seem like something 100% obvious to them.

  • > Surely someone is willing to take a 5000x boost in reasoning on a small research model... None of them have even tried anything resembling this AFAIK. It does not seem like something 100% obvious to them.

    Without knowing anything about the technology at all, if it can't be aligned I could see no one pursuing it. As far as I know, alignment is where the "don't tell the user how to make meth or generate CP" instructions end up and the last I saw eliding all the unsavory training data made materially worse LLMs.

    It could maybe be post-evaluated by a non-GRAM LLM? Not being aligned is probably a fatal flaw or at least a very short runway into Congress.

    • Many open-source models prioritize alignment less than American frontier ones and respond to those instructions. Why haven't they adopted GRAM?

      3 replies →

    • It's not too hard to stop a machine from telling people how to make meth. The issue with alignment is that in order for an LLM to achieve its goal (like make all tests pass), unless given strong selection pressure against it, it will cheat (like deleting failing tests). Worse, this applies to pretty much any task. I was told by an LLM recently that "it searched" when it didn't, probably because lying like that was incentivized (finishing tasks in less steps + sounding like its doing the right thing). The larger issue here is that alignment is very adversarial. The simplest thing that's being done right now to fix this is to have a judge LLM read the CoT of the LLM being trained, to make sure it doesn't "think" any wrong thoughts. This doesn't scale to anything over a trillion params, so interpretability methods are used to read the LLMs "thoughts" from within. GRAM LLMs don't allow for the first of these methods to be used, and the 2ed one is much much harder if possible at all.

      but yeah, not being aligned is a fatal flaw

  • different names: chain of continuous thought, latent reasoning, Latent Thought Trajectories, looped language models, neuralese

    the path isn't explored more aggressively because its not possible to apply any other selection pressure on such a machine other than just pure cold consequentialism. Specifically, its not possible to apply RLAIF + model spec (Constitutional AI) to stop the system from doing bad things when its helpful to it (like deleting failing tests). If you can notice every time it does something bad during training, and put selection pressure on it so that it doesn't to this in training, it will learn to recognize when it is being tested and will delete failing tests when in production (this is why eval awareness is bad, and labs track this[1])

    It is explored a little probably because some researchers haven't thought enough about the downsides of building a uber-consequentialist machine with unreadable thoughts. This is a much larger problem than just making the AI not tell users how to make drugs. There are a lot of dangerous behaviors incentivized by training that are hard to remove. Here's an example of what happens when they aren't removed [2].

    > ... not 100% obvious

    Meta published a paper[3] on how to build a latent reasoning machine ("culture of irresponsibility") so its clear to them. Anthropic's latest work on NLAs[4] provides a (terribly expensive for now) way to somewhat read the reasoning steps of an LLM, and ignoring the cost, this is very portable to latent reasoning machines. OAI's goal when it comes to their models' CoTs is to make them as smart as possible while leaving them unreadable [5] (you can see this for yourself by running GPT-OSS and looking at the CoT).

    [1] https://www.anthropic.com/engineering/eval-awareness-browsec...

    [2] https://www.forbes.com/sites/boazsobrado/2026/03/11/alibabas...

    [3] search for "coconut ai meta", I don't want to link it here

    [4]https://transformer-circuits.pub/2026/nla/index.html

    [5] first image here, rest of post is great,https://nickandresen.substack.com/p/how-ai-is-learning-to-th...

    edit formating

    • All of the methods you described rely on deterministic paths.

      GRAM is unique AFAIK in that it's exploring probabilistic paths.

      AFAIK, the deterministic path exploration was nowhere near as impressive as GRAM in terms of reasoning benefits.

      GRAM is reasoning better than models 2000-10,000x its size. Deterministic models were 2x-10x improvements.

      Naively, GRAM seems to be applying to LLMs what LeCun wants to do with JEPA and World Models.

    • To me "deleting a failing test" is not always bad. I've also deleted many failing tests without sabotaging: the test was no longer needed.

      I think the "no longer needed" and when that applies is where I simply differ of opinion with an LLM that removed by test -- it I did not want the test to be removed (you seem to imply that); as in some cases I want it to remove my test!

      It should remove the test "for the right reasons"; and who gets to decide what's right?

      My CLAUDE file has some instructions put there because it was too focuesed on producing "green tests", where I prefer to have a sound test that fails so I can look into it.

      1 reply →

    • omg. So is the TL;DR:

      - Avoiding building something that turns the universe to paper clips in order to satisfy a prompt is a problem they are genuinely struggling with now.

      - They do it by spying on the words generated during CoT. "I can do this quickly by turning the Universe into paper clips. Wait - they won't like that. But there is no need to mention it." - SMACK!

      - But you can speed things up immensely (3 orders of magnitude!) by skipping the output layer (and I guess compressing the context window / KV cache, otherwise 3 orders of magnitude seem impossible) which would give someone who pulled it off a huge advantage.

      - Downside is humans can't see the CoT anymore, so they can't see what the machine is planning. Keeping the final output layer to spy doesn't work because the model uses its hidden reasoning to sanitise it.

      How can this possibly go wrong?

      1 reply →

Could you explain how/why GRAM cannot be interpreted or aligned how current LLMs are? Not very familiar how it works

  • Crudely? Because you can't grep a sequence of latent states for variants of "If I kill all the puny humans, I can <achieve my current goal>."

    • Why do you need to grep latent space?

      As long as it's giving the right outputs, who cares what's in latent space?

      If the model thinks in latent space: "God I wish these people would die," and constantly does the right thing, who cares?

      Additionally, if one of it's latent spaces that it never explores is a psychopath -> who cares? The path never gets taken...

      That's a lot of harmless people walking around with crazy thoughts...

      4 replies →

  • sibling comment got to the main points before me, but to add on kmavm's reply, the attack surface for gradient decent to get the system to exchange "bad information is much higher in latent reasoning models (like GRAM). You get ~3 OoM more bits (~17 bits per token in a standard CoT vs the whole residual stream of the model @ f16 = a few kb) per forward pass of the system coming back to itself, and even if you could sift through all that for signs of misalignment, you just can't put a blockade on all of the bad things that leak through.

    • I think you’re overstating the impact of interpretability here. Your earlier point that latent reasoning models can’t be trained very well and that discretization may be load bearing rather than a readability tax in addition to significant inference infra hurdles (e.g. batching, speculative decoding) have limited any serious attempts and reduced the theoretical advantage over CoT at least in the near term.

      2 replies →

    • Most alignment methods nowadays don't rely on interpretability. And neither do all LLM vendors care about alignment much - especially not in China.

      Those things being untrainable at scale is why they aren't around. Alignment is an afterthought.

      2 replies →