Comment by benjiro29

12 hours ago

GLM 5.2 Max = Opus 4.8 Max in thinking behavior. The thinking chain is so similar, and so is the amount of token usage on the output.

If you want reasonable token usage, you need to run it GLM 5.2 at High. There is little drop in quality from Max to High (for most tasks). And it cuts token usage by 2 a 2.5x. GLM 5.2, Max is really something you only need for complex tasks.

In essence, GLM 5.2 is Opus 4.8 its little brother, at a way, WAY cheaper price.

There has been really no training on Opus models going on, really, none i tell you! /sarcasm

25 comments

benjiro29

matheusmoreira 4 hours ago

> GLM 5.2 Max = Opus 4.8 Max in thinking behavior

This is insane! I can't wait until technology progresses to the point we can run these things on consumer hardware!

muyuu 22 minutes ago
you need 8 x 96GB Blackwell or equivalent
so around US$150k which is Small/Medium-Enterprise territory already, but who knows when it will hit "reasonable" home consumer territory
I think there's hope future generations of unified memory machines may get this sort of memory availability when new fabs open in then next couple of years and then ramp up production for a few years afterwards - that makes ~2030s credible at this point, but nobody can really predict the market that far ahead
- matheusmoreira 3 minutes ago
  
  > I think there's hope future generations of unified memory machines may get this sort of memory availability
  I hope you're right. This is a very exciting idea. The weights are out there. The demand is astronomical. The manufacturers just need to make it happen.
chartpath 2 hours ago
Are there any indications that this will be possible? Consumer hardware will continue getting better but I can't see 512GB RAM in a MacBook Pro any time soon. I'm hoping linear attention techniques plus MoE will make breakthroughs in size/compression and throughput.
- nijave 25 minutes ago
  
  Well, we're probably not going to be running frontier models anytime soon, but I think the general assumption is smaller models will continue to improve until they're sufficiently good frontier models aren't needed.
  There's potentially also augmentation through tools, harnesses and RAG to help boost how well they work without tons of parameters.
- matheusmoreira 2 hours ago
  
  Certainly not any time soon, but I have faith it'll happen one day.

vitalyan123 12 hours ago

distillation of thinking models is not particularly effective - both "Open"AI and Misanthropic don't show you the real chain of thought, only its severely downscaled version. both do everything in their power to combat such outrageous copyright infringement, so the bulk of unethically scrapped data the Chinese have is from several generations ago.

nyrikki 5 hours ago

It is quite likely that the intermediate tokens don’t have ‘semantic import’[0]
There are methods like Habitual Reasoning Distillation or Inverted Reasoning Traces [1] that can help.
While there are reasons to hide the intermediate tokens from a IP protection stand point, there is also a need to hide more effective and efficient generating that doesn’t fit the R1 claims of an aha moment that has been debunked, but is a consumer expectation.
While hidden intermediate tokens do increase the difficulty, it is not a from barrier in itself, especially as they are billed, given information about their length.
[0] https://arxiv.org/abs/2504.09762v4
[1] https://arxiv.org/abs/2603.07267
duskdozer 12 hours ago
>such outrageous copyright infringement
Sarcasm, considering the source of their own training data?
- margalabargala 9 hours ago
  
  Considering they called the company "Misanthropic", sarcasm is a safe bet.
- orphea 12 hours ago
  
  Narrator: it was sarcasm, indeed.
- baron3dl 10 hours ago
  
  IP for me, not thee.
Bolwin 5 hours ago
For Claude models at least, you can tell to just manually think in the output and it works fine. I do it reguralrly because for creative writing and summarization, they seem to believe they don't need to think at all, and get way worse results.
- carterschonwald 4 hours ago
  
  this helps so much. i do it too. with some of the newer frontier models its unclear if you can even turn it off in the first party chat apps. havent compared api semantics yet.
overfeed 8 hours ago

FYI: model outputs are not protected by copyright.
mirekrusin 4 hours ago
I don’t understand why there isn’t public dataset for reasoning that can be improved by humans/llms like Wikipedia (ie with auto judging contributions etc).
- logicchains 3 hours ago
  
  For reasoning a manually-curated dataset is too small; you need to be able to automatically generate vast volumes of synthetic reasoning data with provably correct answers. That's presumably why Claude and GPT are so good at using Lean (the theorem prover), because they get fed a bunch of synthetic, verifiably correct training data.
kmeisthax 3 hours ago

Chinese distillation attacks are about as unethical as Robin Hood stealing from the rich to give to the poor. The real unethical scraping was done by Anthropic to train Claude.
To be clear, if Anthropic was using totally licensed data, I'd be sympathetic to these claims. But if you're going to pirate the world's creativity you'd better be willing to gimme dat shit for free[0].
[0] As said by Hungry Santa.
BoorishBears 1 hour ago

Reasoning models can coaxed to reason like they do in dedicated reasoning blocks, outside of those blocks: in normal parts of the response.
But Anthropic at least has openly admitted they try to detect that and interfere
ComputerGuru 9 hours ago

Supposedly there are “jailbreaks” that expose considerably more of the thinking traces.
orbital-decay 3 hours ago

You can trivially leak the CoT of any current model, it's not a problem.
>outrageous copyright infringement
>unethically scrapped data
Hahahahaha
mannanj 10 hours ago

The companies that did copyright infringement and unethically scrapped data think that copyright infringement and unethically scrapping data is wrong and needs to be stopped.
Though only in particular situations, like when it’s done to them and not when they do it. Cause they have the power and are morally right and know better than you. And if you question this at all, well you’re a threat to American values and a supporter of the Chinese and leading to the break down of Democracy.
This isn’t a type of reasoning argument or manipulation tactic used by the rich throughout history to trick the naive and gullible masses or anything like that. Trust me, I’m rich and I’m morally right. /sarcasm

maxdo 8 hours ago

looking at the score this is rather a gemini 3.5 flash competitor, yes, for cheaper, but distance to opus and fable is as big as their price diff.

FooBarWidget 7 hours ago

With such ridiculously long thinking traces I'm surprised max outperforms high. After all, performance falls off a hill after a certain amount of context, and long thinking traces can fill that up really quickly.