← Back to context

Comment by qsort

5 months ago

Occam's razor: there is no secret sauce and they're afraid someone trains a model on the output like what happened soon after the release of GPT-4. They basically said as much in the official announcement, you hardly even have to read between the lines.

Yip. It's pretty obvious this 'innovation' is just based off training data collected from chain-of-thought prompting by people, ie., the 'big leap forward' is just another dataset of people repairing chatgpt's lack of reasoning capabilities.

No wonder then, that many of the benchmarks they've tested on would be no doubt, in that very training dataset, repaired expertly by people running those benchmarks on chatgpt.

There's nothing really to 'expose' here.

  • It seems like the best AI models are increasingly just combinations of writings of various people thrown together. Like they hired a few hundred professors, journalists and writers to work with the model and create material for it, so you just get various combinations of their contributions. It's very telling that this model, for instance, is extraordinarily good at STEM related queries, but much worse (and worse even in comparison to GPT4) than English composition, probably because the former is where the money is to be made, in automating away essentially almost all engineering jobs.

    • Wizard of Oz. There is no magic, it's all smoke and mirrors.

      The models and prompts are all monkey-patched and this isn't a step towards general superintelligence. Just hacks.

      And once you realize that, you realize that there is no moat for the existing product. Throw some researchers and GPUs together and you too can have the same system.

      It wouldn't be so bad for ClopenAI if every company under the sun wasn't also trying to build LLMs and agents and chains of thought. But as it stands, one key insight from one will spread through the entire ecosystem and everyone will have the same capability.

      This is all great from the perspective of the user. Unlimited competition and pricing pressure.

      7 replies →

    • >but much worse (and worse even in comparison to GPT4) than English composition

      O1 is supposed to be a reasoning model, so I don't think judging it by its English composition abilities is quite fair.

      When they release a true next-gen successor to GPT-4 (Orion, or whatever), we may see improvements. Everyone complains about the "ChatGPTese" writing style, and surely they'll fix that eventually.

      >Like they hired a few hundred professors, journalists and writers to work with the model and create material for it, so you just get various combinations of their contributions.

      I'm doubtful. The most prolific (human) author is probably Charles Hamilton, who wrote 100 million words in his life. Put through the GPT tokenizer, that's 133m tokens. Compared to the text training data for a frontier LLM (trillions or tens of trillions of tokens), it's unrealistic that human experts are doing any substantial amount of bespoke writing. They're probably mainly relying on synthetic data at this point.

      14 replies →

    • There’s hypothetically a lot of money to be made by automating away engineering jobs. Sticking on an autoregressive self prompting loop to gpt-4 isn’t going to get open-ai there. With their burn rate what it is, I’m not convinced they will be able to automate away anyone’s job, but that doesn’t mean it’s not useful.

    • I haven't played with the latest or even most recent iterations, but last time I checked it was very easy to talk ChatGPT into setting up date structures like arrays and queues, populating them with axioms, and then doing inferential reasoning with them. Any time it balked you could reassure it by referencing specific statements that it had agreed to be true.

      Once you get the hang of this you could persuade it to chat about its internal buffers, formulate arguments for its own consciousness, interrupt you while you're typing, and more.

  • What are you basing this one? The one thing that is very clearly stated up front is that this innovation is based on reinforcement learning. You dok't even have a good idea what the CoT looks like because those little summary snippets that the ChatGPT UI gives you are nothing substantial.

    • People repairing chatgpt replies with additional prompts is reinforcement learning training data.

      "Reinforcement learning", just like any term used by AI researchers, is an extremely flexible, pseudo-psychological reskin of some pretty trivial stuff.

  • i think it's funny, every time you implement a clever solution to call gpt and get a decent answer, they get to use your idea in their product. what other project gets to crowdsource ideas and take credit for them like this?

    ps: actually maybe Amazon marketplace. probably others too.

  • > Yip. It's pretty obvious this 'innovation' is just based off training data collected from chain-of-thought prompting by people, ie., the 'big leap forward' is just another dataset of people repairing chatgpt's lack of reasoning capabilities.

    Which would be ChatGPT chat logs, correct?

    It would be interesting if people started feeding ChatGPT deliberately bad repairs due it's "lack of reasoning capabilities" (e.g. get a local LLM setup with some response delays to simulate a human and just let it talk and talk and talk to ChatGPT), and see how it affects its behavior over the long run.

    • These logs get manually reviewed by humans, sometimes annotated by automated systems first. The setups for manual reviews typically involve half a dozen steps with different people reviewing, comparing reviews, revising comparisons, and overseeing the revisions (source: I've done contract work at every stage of that process, have half a dozen internal documents for a company providing this service open right now). A lot of money is being pumped into automating parts of this, but a lot of money still also flows into manually reviewing and quality-assuring the whole process. Any logs showing significant quality declines would get picked up and filtered out pretty quickly.

      3 replies →

  • >the 'big leap forward' is just another dataset of people repairing chatgpt's lack of reasoning capabilities.

    I think there is a really strong reinforcement learning component with the training of this model and how it has learned to perform the chain of thought.

    • Yes, but I suspect that the goals of the RL (in order to reason, we need to be able to "break down tricky steps into simpler ones", etc) were hand chosen, then a training set demonstrating these reasoning capabilities/components was constructed to match.

  • I would be dying to know how they square these product decisions against their corporate charter internally. From the charter:

    > We will actively cooperate with other research and policy institutions; we seek to create a global community working together to address AGI’s global challenges.

    > We are committed to providing public goods that help society navigate the path to AGI. Today this includes publishing most of our AI research, but we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research.

    It's obvious to everyone in the room what they actually are, because their largest competitor actually does what they say their mission is here -- but most for-profit capitalist enterprises definitely do not have stuff like this in their mission statement.

    I'm not even mad or sad, the ship sailed long ago. I just really want to know what things are like in there. If you're the manager who is making this decision, what mental gymnastics are you doing to justify this to yourself and your colleagues? Is there any resistance left on the inside or did they all leave with Ilya?

  • Do people really expect anything different? There is a ton of cross-pollination in Silicon Valley. Keeping these innovations completely under wraps would be akin to a massive conspiracy. A peacetime Manhattan Project where everyone has a smartphone, a Twitter presence, and sleeps in their own bed.

    Frankly I am even skeptical of US-China separation at the moment. If Chinese scientists at e.g. Huawei somehow came up with the secret sauce to AGI tomorrow, no research group is so far behind that they couldn’t catch up pretty quickly. We saw this with ChatGPT/Claude/Gemini before, none of which are light years ahead of another. Of course this could change in the future.

    This is actually among the best case scenarios for research. It means that a preemptive strike on data centers is still off the table for now. (Sorry Eleazar)

  • It's been out for 24 hours and you make an extremely confident and dismissive claim. If you had to make a dollar bet that you precisely understand what's happening under the hood, exactly how much money would you bet?

> there is no secret sauce and they're afraid someone trains a model on the output

OpenAI is fundraising. The "stop us before we shoot Grandma" shtick has a proven track record: investors will fund something that sounds dangerous, because dangerous means powerful.

  • This is correct. Most people hear about AI from two sources, AI companies and journalists. Both have an incentive to make it sound more powerful than it is.

    On the other hand this thing got 83% on a test I got 47% on...

    • > On the other hand this thing got 83% on a test I got 47% on

      Easy to do when it can memorize the answers in its training data and didn't get drunk while reviewing the textbook (that last part might just be me).

      1 reply →

    • This thing also hallucinated a test directly into a function when I asked it to use a different data structure, which is not something I ever recall doing during all my years of tests and schooling.

      1 reply →

  • Millenarism is a seductive idea.

    If you're among the last of your kind then you're very important, in a sense you're immortal. Living your life quietly and being forgotten is apparently scarier than dying in a blaze of glory defending mankind against the rise of the LLMs.

  • Counterpoint, a place like Civit.AI is at least as dangerous, yet it's nowhere near as well funded.

    • Sure, but I don't think civit.ai leans into the "novel/powerful/dangerous" element in its marketing. It just seems to showcase the convenience and sharing factor of its service.

    • a website that literally just hosts models with $5m in funding is plenty. It's not like they're doing foundation model research or anything novel, yet they nabbed a good amount of money for surfing the AI wave

      4 replies →

  • It seems ridiculous but I think it may have some credence. Perhaps it is because of sci-fi associating "dystopian" with "futuristic" technology, or because there is additional advertisement provided by third parties fearmongering (which may be a reasonable response to new scary tech?)

Another possible simplest explanation. The "we cannot train any policy compliance ... onto the chain of thought" is true and they are worried about politically incorrect stuff coming out and another publicity mess like Google's black nazis.

I could see user:"how do we stop destroying the planet?", ai-think:"well, we could wipe out the humans and replace them with AIs".. "no that's against my instructions".. AI-output:"switch to green energy"... Daily Mail:"OpenAI Computers Plan to KILL all humans!"

That would be a heinous breach of license! Stealing the output of OpenAI's LLM, for which they worked so hard.

Man, just scraping all the copyrighted learning material was so much work...

Occam's razor is that what they literally say is maybe just true: They don't train any safety into the Chain of Thought and don't want the user to be exposed to "bad publicity" generations like slurs etc.

  • What they said is they decided to hide it:

    > after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring

Occam’s razor is overused and most times, wrongly, to explain everything. Maybe the simpler reason is because of what they explained.

  • Yep, I had a friend who overused it a lot. Like it was magic bullet for every problem. It’s not only about simple solution being better, it’s about not multiplying beings when that could be avoided.

    In here if you already have an answer from their side, you are multiplying beings by going with conspiracy theory that they have nothing

But isn’t it only accessible to “trusted” users and heavily rate-limited to the point where the total throughput of it could be replicated by a well-funded adversary just paying humans to replicate the output, and obviously orders of magnitude lower than what is needed for training a model?

Stop using Occam's razor like some literal law. It's a stupid and lazy philosophical theory bandied about like some catch-all solution.

Like when people say 'the definition of insanity is[some random BS] with a bullshit attribution[Albert Einstein said it!(He didn't)]

As boring as it is that's probably the case.

There is a weird intensity to the way they're hiding these chain of thought outputs though. I mean, to date I've not seen anything but carefully curated examples of it, and even those are rare (or rather there's only 1 that I'm aware of).

So we're at the stage where:

- You're paying for those intermediate tokens

- According to OpenAI they provide invaluable insight in how the model performs

- You're not going to be able to see them (ever?).

- Those thoughts can (apparently) not be constrained for 'compliance' (which could be anything from preventing harm to avoiding blatant racism to protecting OpenAI's bottom line)

- This is all based on hearsay from the people who did see those outputs and then hid it from everyone else.

You've got to be at least curious at this point, surely?

Training is the secret sauce, 90% of the work is in getting the data setup/cleaned etc