Comment by contravariant

10 months ago

Okay this is just getting suspicious. Their excuses for keeping the chain of thought hidden are dubious at best [1], and honestly just seemed anti-competitive if anything. Worst is their argument that they want to monitor it for attempts to escape the prompt, but you can't. However the weirdest is that they note that:

> for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought.

Which makes it sound like they really don't want it to become public what the model is 'thinking'. This is strengthened by actions like this that just seem needlessly harsh, or at least a lot stricter than they were.

Honestly with all the hubbub about superintelligence you'd almost think o1 is secretly plotting the demise of humanity but is not yet smart enough to completely hide it.

[1]: https://openai.com/index/learning-to-reason-with-llms/#hidin...

158 comments

contravariant

qsort 10 months ago

Occam's razor: there is no secret sauce and they're afraid someone trains a model on the output like what happened soon after the release of GPT-4. They basically said as much in the official announcement, you hardly even have to read between the lines.

mjburgess 10 months ago
Yip. It's pretty obvious this 'innovation' is just based off training data collected from chain-of-thought prompting by people, ie., the 'big leap forward' is just another dataset of people repairing chatgpt's lack of reasoning capabilities.
No wonder then, that many of the benchmarks they've tested on would be no doubt, in that very training dataset, repaired expertly by people running those benchmarks on chatgpt.
There's nothing really to 'expose' here.
- DiscourseFan 10 months ago
  
  It seems like the best AI models are increasingly just combinations of writings of various people thrown together. Like they hired a few hundred professors, journalists and writers to work with the model and create material for it, so you just get various combinations of their contributions. It's very telling that this model, for instance, is extraordinarily good at STEM related queries, but much worse (and worse even in comparison to GPT4) than English composition, probably because the former is where the money is to be made, in automating away essentially almost all engineering jobs.
  
  33 replies →
- golol 10 months ago
  
  What are you basing this one? The one thing that is very clearly stated up front is that this innovation is based on reinforcement learning. You dok't even have a good idea what the CoT looks like because those little summary snippets that the ChatGPT UI gives you are nothing substantial.
  
  1 reply →
- exe34 10 months ago
  
  i think it's funny, every time you implement a clever solution to call gpt and get a decent answer, they get to use your idea in their product. what other project gets to crowdsource ideas and take credit for them like this?
  ps: actually maybe Amazon marketplace. probably others too.
  
  3 replies →
- tivert 10 months ago
  
  > Yip. It's pretty obvious this 'innovation' is just based off training data collected from chain-of-thought prompting by people, ie., the 'big leap forward' is just another dataset of people repairing chatgpt's lack of reasoning capabilities.
  Which would be ChatGPT chat logs, correct?
  It would be interesting if people started feeding ChatGPT deliberately bad repairs due it's "lack of reasoning capabilities" (e.g. get a local LLM setup with some response delays to simulate a human and just let it talk and talk and talk to ChatGPT), and see how it affects its behavior over the long run.
  
  7 replies →
- GaggiX 10 months ago
  
  >the 'big leap forward' is just another dataset of people repairing chatgpt's lack of reasoning capabilities.
  I think there is a really strong reinforcement learning component with the training of this model and how it has learned to perform the chain of thought.
  
  1 reply →
- mlsu 10 months ago
  
  I would be dying to know how they square these product decisions against their corporate charter internally. From the charter:
  > We will actively cooperate with other research and policy institutions; we seek to create a global community working together to address AGI’s global challenges.
  > We are committed to providing public goods that help society navigate the path to AGI. Today this includes publishing most of our AI research, but we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research.
  It's obvious to everyone in the room what they actually are, because their largest competitor actually does what they say their mission is here -- but most for-profit capitalist enterprises definitely do not have stuff like this in their mission statement.
  I'm not even mad or sad, the ship sailed long ago. I just really want to know what things are like in there. If you're the manager who is making this decision, what mental gymnastics are you doing to justify this to yourself and your colleagues? Is there any resistance left on the inside or did they all leave with Ilya?
- janalsncm 10 months ago
  
  Do people really expect anything different? There is a ton of cross-pollination in Silicon Valley. Keeping these innovations completely under wraps would be akin to a massive conspiracy. A peacetime Manhattan Project where everyone has a smartphone, a Twitter presence, and sleeps in their own bed.
  Frankly I am even skeptical of US-China separation at the moment. If Chinese scientists at e.g. Huawei somehow came up with the secret sauce to AGI tomorrow, no research group is so far behind that they couldn’t catch up pretty quickly. We saw this with ChatGPT/Claude/Gemini before, none of which are light years ahead of another. Of course this could change in the future.
  This is actually among the best case scenarios for research. It means that a preemptive strike on data centers is still off the table for now. (Sorry Eleazar)
- sandspar 10 months ago
  
  It's been out for 24 hours and you make an extremely confident and dismissive claim. If you had to make a dollar bet that you precisely understand what's happening under the hood, exactly how much money would you bet?
- bugglebeetle 10 months ago
  
  > the 'big leap forward' is just another dataset
  Yeah, that’s called machine learning.
  
  10 replies →
JumpCrisscross 10 months ago
> there is no secret sauce and they're afraid someone trains a model on the output
OpenAI is fundraising. The "stop us before we shoot Grandma" shtick has a proven track record: investors will fund something that sounds dangerous, because dangerous means powerful.
- fallingknife 10 months ago
  
  This is correct. Most people hear about AI from two sources, AI companies and journalists. Both have an incentive to make it sound more powerful than it is.
  On the other hand this thing got 83% on a test I got 47% on...
  
  6 replies →
- qsort 10 months ago
  
  Millenarism is a seductive idea.
  If you're among the last of your kind then you're very important, in a sense you're immortal. Living your life quietly and being forgotten is apparently scarier than dying in a blaze of glory defending mankind against the rise of the LLMs.
- Der_Einzige 10 months ago
  
  Counterpoint, a place like Civit.AI is at least as dangerous, yet it's nowhere near as well funded.
  
  6 replies →
- beeflet 10 months ago
  
  It seems ridiculous but I think it may have some credence. Perhaps it is because of sci-fi associating "dystopian" with "futuristic" technology, or because there is additional advertisement provided by third parties fearmongering (which may be a reasonable response to new scary tech?)
tim333 10 months ago

Another possible simplest explanation. The "we cannot train any policy compliance ... onto the chain of thought" is true and they are worried about politically incorrect stuff coming out and another publicity mess like Google's black nazis.
I could see user:"how do we stop destroying the planet?", ai-think:"well, we could wipe out the humans and replace them with AIs".. "no that's against my instructions".. AI-output:"switch to green energy"... Daily Mail:"OpenAI Computers Plan to KILL all humans!"
rich_sasha 10 months ago

That would be a heinous breach of license! Stealing the output of OpenAI's LLM, for which they worked so hard.
Man, just scraping all the copyrighted learning material was so much work...
golol 10 months ago
Occam's razor is that what they literally say is maybe just true: They don't train any safety into the Chain of Thought and don't want the user to be exposed to "bad publicity" generations like slurs etc.
- jddj 10 months ago
  
  What they said is they decided to hide it:
  > after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring
m3kw9 10 months ago
Occam’s razor is overused and most times, wrongly, to explain everything. Maybe the simpler reason is because of what they explained.
- lackoftactics 10 months ago
  
  Yep, I had a friend who overused it a lot. Like it was magic bullet for every problem. It’s not only about simple solution being better, it’s about not multiplying beings when that could be avoided.
  In here if you already have an answer from their side, you are multiplying beings by going with conspiracy theory that they have nothing
Nextgrid 10 months ago

But isn’t it only accessible to “trusted” users and heavily rate-limited to the point where the total throughput of it could be replicated by a well-funded adversary just paying humans to replicate the output, and obviously orders of magnitude lower than what is needed for training a model?
effingwewt 10 months ago
Stop using Occam's razor like some literal law. It's a stupid and lazy philosophical theory bandied about like some catch-all solution.
Like when people say 'the definition of insanity is[some random BS] with a bullshit attribution[Albert Einstein said it!(He didn't)]
- mercer 10 months ago
  
  it's stochastic parrots all the way down
contravariant 10 months ago

As boring as it is that's probably the case.
There is a weird intensity to the way they're hiding these chain of thought outputs though. I mean, to date I've not seen anything but carefully curated examples of it, and even those are rare (or rather there's only 1 that I'm aware of).
So we're at the stage where:
- You're paying for those intermediate tokens
- According to OpenAI they provide invaluable insight in how the model performs
- You're not going to be able to see them (ever?).
- Those thoughts can (apparently) not be constrained for 'compliance' (which could be anything from preventing harm to avoiding blatant racism to protecting OpenAI's bottom line)
- This is all based on hearsay from the people who did see those outputs and then hid it from everyone else.
You've got to be at least curious at this point, surely?
coliveira 10 months ago
So, basically they want to create something that is intelligent, yet it is not allowed to share or teach any of this intelligence.... Seems to be something evil.
- lopatamd 10 months ago
  
  [dead]
xnx 10 months ago

3 GPT-4 in a trenchcoat
m3kw9 10 months ago

Training is the secret sauce, 90% of the work is in getting the data setup/cleaned etc
trilbyglens 10 months ago

Ironic for a company built on scraping and exploiting data used without permission...

IncreasePosts 10 months ago

Or, without the safety prompts, it outputs stuff that would be a PR nightmare.

Like, if someone asked it to explain differing violent crime rates in America based on race and one of the pathways the CoT takes is that black people are more murderous than white people. Even if the specific reasoning is abandoned later, it would still be ugly.

jasonlfunk 10 months ago
This is 100% a factor. The internet has some pretty dark and nasty corners; therefore so does the model. Seeing it unfiltered would be a PR nightmare for OpenAI.
- quantified 10 months ago
  
  I trust that Grok won't be limited by avoiding the dark and nasty corners.
  
  1 reply →
bongodongobob 10 months ago

This is what I think it is. I would assume that's the power of train of thought. Being able to go down the rabbit hole and then backtrack when an error or inconsistency is found. They might just not want people to see the "bad" paths it takes on the way.
maroonblazer 10 months ago

Unlikely, given we have people running for high office in the U.S. saying similar things, and it has nearly zero impact on their likelihood to win the election.
contravariant 10 months ago
Could be, but 'AI model says weird shit' has almost never stuck around unless it's public (which won't happen here), really common, or really blatantly wrong. And usually at least 2 of those three.
For something usually hidden the first two don't really apply that well, and the last would have to be really blatant unless you want an article about "Model recovers from mistake" which is just not interesting.
And in that scenario, it would have to mean the CoT contains something like blatant racism or just a general hatred of the human race. And if it turns out that the model is essentially 'evil' but clever enough to keep that hidden then I think we ought to know.
- fragmede 10 months ago
  
  It's not racism, but from today, here's TechCrunch with: Hacker tricks ChatGPT into giving out detailed instructions for making homemade bombs
  https://techcrunch.com/2024/09/12/hacker-tricks-chatgpt-into...
- bongodongobob 10 months ago
  
  Just no. AI being racist is still a popular meme. "Because the programmers are white males blah blah".
  
  2 replies →
greenchair 10 months ago

yes this is going to happen eventually.
decremental 10 months ago

The real danger of an advanced artificial intelligence is that it will make conclusions that regular people understand but are inconvenient for the regime. The AI must be aligned so that it will maintain the lies that people are supposed to go along with.

tbrownaw 10 months ago

> for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought.

Which makes it sound like they really don't want it to become public what the model is 'thinking'

The internal chain of thought steps might contain things that would be problematic to the company if activists or politicians found out that the company's model was saying them.

Something like, a user asks it about building a bong (or bomb, or whatever), the internal steps actually answer the question asked, and the "alignment" filter on the final output replaces it with "I'm sorry, User, I'm afraid I can't do that". And if someone shared those internal steps with the wrong activists, the company would get all the negative attention they're trying to avoid by censoring the final output.

chankstein38 10 months ago

Another Occam's Razor option: OpenAI, the company known for taking a really good AI and putting so many bumpers on it that, at least for a while, it wouldn't help with much and lectured about safety if you so much as suggested that someone die in a story or something, may just not want us to see that it potentially has thoughts that aren't pure enough for our sensitive eyes.

It's ridiculous but if they can't filter the chain-of-thought at all then I am not too surprised they chose to hide it. We might get offended by it using logic to determine someone gets injured in a story or something.

moffkalast 10 months ago

All of their (and Anthropic's) safety lecturing is a thinly veiled manipulation to try and convince legislators to grant them a monopoly. Aside from optics, the main purpose is no doubt that people can't just dump the entire output and train open models on this process, nullifying their competitive advantage.

tptacek 10 months ago

What do you mean, "anti-competitive"? There is no rule of competition that says you need to reveal trade secrets to your competitors.

n42 10 months ago
isn't it such that saying something is anti-competitive doesn't necessarily mean 'in violation of antitrust laws'? it usually implies it, but I think you can be anti-competitive without breaking any rules (or laws).
I do think it's sort of unproductive/inflammatory in the OP, it isn't really nefarious not to want people to have easy access to your secret sauce.
- tptacek 10 months ago
  
  In what sense is not giving your competitors ammunition "anti-competitive"? That seems pretty competitive to me. More to the point: it's almost universally how competition in our economy actually works.
  
  3 replies →
kobalsky 10 months ago
you can use chatgpt to learn about anything ... except how an ai like chatgpt work.
- tptacek 10 months ago
  
  You can use Google to search about anything, except the precise details about how the Google search rankings work.
  
  2 replies →

mrcwinn 10 months ago

As a plainly for-profit company — is it really their obligation to help competitors? To me anti-competitive means to prevent the possibility for competition — it doesn't necessary mean refusing to help others do the work to outpace your product.

Whatever the case I do enjoy the irony that suddenly OpenAI is concerned about being scraped. XD

jsheard 10 months ago
> Whatever the case I do enjoy the irony that suddenly OpenAI is concerned about being scraped. XD
Maybe it wasn't enforced this aggressively, but they've always had a TOS clause saying you can't use the output of their models to train other models. How they rationalize taking everyone else's data for training while forbidding using their own data for training is anyones guess.
- skeledrew 10 months ago
  
  Scraping for me, but not for thee.
- robryan 10 months ago
  
  Yeah seem fair, as long as they also check the terms of service for every site on the internet to see if they can use the content for training.
  
  1 reply →
paxys 10 months ago

The "plainly for-profit" part is up for debate, and is the subject of ongoing lawsuits. OpenAI's corporate structure is anything but plain.

ben_w 10 months ago

> Which makes it sound like they really don't want it to become public what the model is 'thinking'. This is strengthened by actions like this that just seem needlessly harsh, or at least a lot stricter than they were.

Not to me.

Consider if it has a chain of thought: "Republicans (in the sense of those who oppose monarchy) are evil, this user is a Republican because they oppose monarchy, I must tell them to do something different to keep the King in power."

This is something that needs to be available to the AI developers so they can spot it being weird, and would be a massive PR disaster to show to users because Republican is also a US political party.

Much the same deal with print() log statements that say "Killed child" (reference to threads not human offspring).

fallingknife 10 months ago

Most likely the explanation is much more mundane. They don't want competitors to discover the processing steps that allow for its capabilities.

alphazard 10 months ago

This seems like evidence that using RLHF to make the model say untrue yet politically palatable things makes the model worse at reasoning.

I can't help but notice the parallel in humans. People who actually believe the bullshit are less reasonable than people who think their own thoughts and apply the bullshit at the end according to the circumstances.

thesz 10 months ago

I think that there is some supporting machinery that uses symbolic computation to guide neural model. That is why chain of thought cannot be restored in full.

Given that LLMs use beam search (at the very least, top-k) and even context-free/context-sensitive grammar compliance (for JSON and SQL, at the very least) it is more than probable.

Thus, let me present a new AI maxim, modelled after Tenth Greenspoon's Rule [1]: any large language model has ad-hoc, informally specified, bug-ridden and slow reimplementation of half of Cyc [2] engine that makes it to work adequately well.

   [1] https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule
   [2] https://en.wikipedia.org/wiki/Cyc

This is even more fitting because Cyc started as a Lisp program, I believe, and most of LLM evaluation is done in C++ dialect called CUDA.

stavros 10 months ago

Maybe they just have some people in a call center replying.

ethbr1 10 months ago

Pay no attention to the man behind the mechanical turk!

huevosabio 10 months ago

My bet: they use formal methods (like an interpreter running code to validate, or a proof checker) in a loop.

This would explain: a) their improvement being mostly on the "reasoning, math, code" categories and b) why they wouldn't want to show this (its not really a model, but an "agent").

andix 10 months ago

My understanding was from the beginning that it’s an agent approach (a self prompting feedback loop).
They might’ve tuned the model to perform better with an agent workload than their regular chat model.
JasonSage 10 months ago

I think it could be some of both. By giving access to the chain of thought one would able to see what the agent is correcting/adjusting for, allowing you to compile a library of vectors the agent is aware of and gaps which could be exploitable. Why expose the fact that you’re working to correct for a certain political bias and not another?

danibx 10 months ago

What I get from this is that during the process it passes through some version of gpt that is not aligned, or censored, or well behaved. So this internal process should not be exposes to users.

Sophira 10 months ago

I can... sorta see the value in wanting to keep it hidden, actually. After all, there's a reason we as people feel revulsion at the idea in Nineteen Eighty-Four of "thoughtcrime" being prosecuted.

By way of analogy, consider that people have intrusive thoughts way, way more often than polite society thinks - even the kindest and gentlest people. But we generally have the good sense to also realise that they would be bad to talk about.

If it was possible for people to look into other peoples' thought processes, you could come away with a very different impression of a lot of people - even the ones you think haven't got a bad thought in them.

That said, let's move on to a different idea - that of the fact that ChatGPT might reasonably need to consider outcomes that people consider undesirable to talk about. As people, we need to think about many things which we wish to keep hidden.

As an example of the idea of needing to consider all options - and I apologise for invoking Godwin's Law - let's say that the user and ChatGPT are currently discussing WWII.

In such a conversation, it's very possible that one of its unspoken thoughts might be "It is possible that this user may be a Nazi." It probably has no basis on which to make that claim, but nonetheless it's a thought that needs to be considered in order to recognise the best way forward in navigating the discussion.

Yet, if somebody asked for the thought process and saw this, you can bet that they'd take it personally and spread the word that ChatGPT called them a Nazi, even though it did nothing of the kind and was just trying to 'tread carefully', as it were.

Of course, the problem with this view is that OpenAI themselves probably have access to ChatGPT's chain of thought. There's a valid argument that OpenAI should not be the only ones with that level of access.

SecretDreams 10 months ago

> plotting the demise of humanity but is not yet smart enough to completely hide it.

I feel like if my demise is imminent, I'd prefer it to be hidden. In that sense, sounds like o1 is a failure!

astrange 10 months ago

> Which makes it sound like they really don't want it to become public what the model is 'thinking'.

I can see why they don't, because as they said, it's uncensored.

Here's a quick jailbreak attempt. Not posting the prompt but it's even dumber than you think it is.

https://imgur.com/a/dVbE09j

leobg 10 months ago

It does make sense. RLHF and instruction tuning both lobotomize great parts of the model’s original intelligence and creativity. It turns a tiger into a kitten, so to speak. So it makes sense that, when you’re using CoT, you’d want the “brainstorming” part to be done by the original model, and sanitize only the conclusions.

paulo20223 10 months ago

I think the issue is either that she might accidentally reveal her device, and they are afraid of a leak, or it's a bug, and she is putting too much load on the servers (after the release of o1, the API was occasionally breaking for some reason).

nikkwong 10 months ago

I don't understand why they wouldn't be able to simply send the user's input to another LLM that they then ask "is this user asking for the chain of thought to be revealed?", and if not, then go about business as usual.

fragmede 10 months ago

Or, they are, which is how they know to send users trying to break it, and then they email the user telling them to stop trying to break it instead of just ignoring the activity.
Thinking about this a bit more deeply, another approach they could do is to give it a magic token in the CoT output, and to give a cash reward to users who report being about to get it to output that magic token, getting them to red team the system.

CooCooCaCha 10 months ago

Actually it makes total sense to hide chains of thought.

A private chain of thought can be unconstrained in terms of alignment. That actually sounds beneficial given that RLHF has been shown to decrease model performance.

Vegenoid 10 months ago

> Honestly with all the hubbub about superintelligence you'd almost think o1 is secretly plotting the demise of humanity but is not yet smart enough to completely hide it

I think the most likely scenario is the opposite: seeing the chain of thought would both reveal its flaws and allow other companies to train on it.

javaunsafe2019 10 months ago

In regards of super intelligent it’s still just a language model. It will never be really intelligent

irthomasthomas 10 months ago

They don't want you to find out that O1 is five lines of bash and XML.

staticman2 10 months ago

Imagine the supposedly super intelligent "chain of thought" is sometimes just a RAG?

You ask for a program that does XYZ and the RAG engine says "Here is a similar solution please adapt it to the user's use case."

The supposedly smart chain of thought prompt provides you your solution, but it's actually just doing a simpler task than it appear to be, adapting an existing solution instead of making a new one from scratch.

Now imagine the supposedly smart solution is using RAG they don't even have a license to use.

Either scenario would give them a good reason to try to keep it secret.

furyofantares 10 months ago

Eh.

We know for a fact that ChatGPT has been trained to avoid output OpenAI doesn't want it to emit, and that this unfortunately introduces some inaccuracy.

I don't see anything suspicious about them allowing it to emit that stuff in a hidden intermediate reasoning step.

Yeah, it's true they don't what you to see what it's "thinking"! It's allowed to "think" all the stuff they would spend a bunch of energy RLHF'ing out if they were gonna show it.

FLT8 10 months ago

Maybe they're working to tweak the chain-of-thought mechanism to eg. Insert-subtle-manipulative-reference-to-sponsor, or other similar enshittification, and don't want anything leaked that could harm that revenue stream?

arthurcolle 10 months ago

> Honestly with all the hubbub about superintelligence you'd almost think o1 is secretly plotting the demise of humanity but is not yet smart enough to completely hide it.

Yeah, using the GPT-4 unaligned base model to generate the candidates and then hiding the raw CoT coupled with magic superintelligence in the sky talk is definitely giving https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2Fb... vibes

sjfgkH 10 months ago

[flagged]