Comment by adamgordonbell

12 hours ago

Here is the chat:

    don't search the internet. This is a test to see how well you can craft non-trivial, novel and creative proofs given a "number theory and primitive sets" math problem. Provide a full unconditional proof or disproof of the problem.

    {{problem}}

    REMEMBER - this unconditional argument may require non-trivial, creative and novel elements.

Then "Thought for 80m 17s"

https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...

68 comments

adamgordonbell

urutom 5 hours ago

What I find fascinating about the shared prompt isn’t just the result, but the visible thinking process. Math papers usually skip all the messy parts and just present the polished proof. But here you get something closer to their notepad. I also find it oddly endearing when the AI says things like “Interesting!” It almost feels like a researcher encouraging themselves after a small progress. It gives me rare feeling of watching the search itself, not just the final result.

bertil 1 hour ago
> the AI says things like “Interesting!”
My experience of those utterance is that it’s purely phatic mimicry: they lack genuine intuitive surprise, it’s just marking a very odd shift in direction. The problem isn’t the lack of path, is that the rhetorical follow-up to those leaps are usually relevant results, so they stream-of-token ends up rapidly over-playing its own conviction. That’s why it’s necessary (and often ineffective) to tell them to validate their findings thoroughly: too much of their training is “That’s odd” followed by “Eureka!” and not “Nevermind…”
- sigbottle 1 hour ago
  
  I think that a lot of models have to sprinkle in a lot of "fluff" in their thinking to stay within the right distribution. They only have language as their only medium; the way we annotate context is via brackets and then training them to hopefully respect the brackets. I'd imagine that either top labs explicitly train, or through the RL process the models implicitly learn, to spam tokens to keep them 'within distribution' since everything's going through the same channel and there's no fine grained separation between things.
  Philosophically, it's not like you're a detached observer who simply reasons over all possible hypotheses. Ever get stuck in a dead end and find it hard to dig yourself out? If you were a detached observer, it'd be pretty easy to just switch gears. But it's not (for humans).
- jackcarter 1 hour ago
  
  It’s funny that this is probably due to bias in the training texts, right? Humans are way more likely to publish their “Eureka!” moments than their screwups… if they did, maybe models would’ve exhibit this behavior.
  Now that AI labs have all these “Nevermind” texts to train on, maybe it’s getting easier to correct? (Would require some postprocessing to classify the AI outputs as successful or not before training)
  
  1 reply →
- animal531 1 hour ago
  
  I've somehow managed to train mine out of trying to fluff me up the whole time, its become very factual.
  Overall it saves me a lot of time reading when it's just focusing on the details.
- epolanski 1 hour ago
  
  Interestingly this is strikingly similar to how my mind would process something I find genuinely interesting.
rafaelmn 1 hour ago

This is another underrated benefit of working with LLMs. When I work I don't take detailed notes about my thinking, decisions, context, etc. I just focus on code. If I get interrupted it takes me a while to get back into the flow.
With LLMs I just read back a few turns and I'm back in the loop.
notahacker 2 hours ago
The actual iteration through various learned approaches to dealing with problems I'd probably find fascinating if I understood the maths! Especially if I knew it well enough to know which approaches were conventional and which weren't.
I find the AI pronouncing things "interesting!" less interesting on the basis that even though in this case it crops up in the thinking rather than flattering the user in the chat, it's almost as much of an AI affectation as the emdash.
- jdmichal 2 hours ago
  
  I always assumed the "interesting!" markers were actual markers. A kind of tag for the system to annotate its context.
  
  1 reply →
andrepd 1 hour ago
The simulacrum of a thing is not the thing! Not only is the "interesting!" unrelated to any "thought process", the whole """thinking""" output is not a representation of a thought process but merely a post-facto confabulation that sounds appropriately human-like.
- pglevy 14 minutes ago
  
  Can't help but think of this I re-read recently from Nietzche:
  > When I analyze the process that is expressed in the sentence, "I think," I find a whole series of daring assertions that would be difficult, perhaps impossible, to prove; for example, that it is I who think, that there must necessarily be something that thinks, that thinking is an activity and operation on the part of a being who is thought of as a cause, that there is an "ego," and, finally, that it is already determined what is to be designated by thinking—that I know what thinking is.
  
  1 reply →
- clejack 22 minutes ago
  
  Yes, I recently got access to an annotations platform for llms, and I've found many projects associated with generating chain of thought outputs.
  These COT outputs are the same sort of illusion as the general output. Someone is feeding them scripts of what it looks like to solve problems, so they generate outputs that look like problem solving.
  I can't remember if I mentioned it previously on here, but an llm seems to be an extremely powerful synthesis machine. If you give it all of the individual components to solve a complex problem that humans might find intractable due to scope or bias, it may be able to crack the problem.
cubefox 1 hour ago

[dead]

petra 2 hours ago

I don't haven ChatGPT but Gemini and Claude. But how do you make a language model think for 80 minutes ???

zeven7 1 hour ago

I have Gemini and ChatGPT and keep them on the highest thinking settings. ChatGPT will regularly think 40-60 minutes on the same problem that Gemini will think 10-15 minutes on. The quality of ChatGpt’s response is usually a little higher but not that much higher. My takeaway is Gemini is better at thinking faster, maybe has better more dedicated hardware behind it, and I use Gemini if I want a faster answer but ChatGPT I’d I want to push the quality of the answer a little higher.
somewhatgoated 2 hours ago

It has an “high effort” mode that makes it think really long
staticassertion 1 hour ago

In my experience, you can tell them "Don't stop working on this until complete" and they'll go for an hour or more.
baxtr 2 hours ago

Give it hard enough problems?

nycdatasci 10 hours ago

Tried w/ 5.5 Pro, Extended Thinking. 17 minutes:

-----------------------------

Yes. In fact the proposed bound is true, and the constant 1 is sharp.

Let w(a)= 1/alog(a)

I will prove that, uniformly for every primitive A⊂[x,∞), ∑w(a)≤1+O(1/log(x)) , which is stronger than the requested 1+o(1).

https://chatgpt.com/share/69ed8e24-15e8-83ea-96ac-784801e4a6...

mrabcx 2 hours ago
Tried the same prompt in DeepSeek 4
https://chat.deepseek.com/share/nyuz0vvy2unfbb97fv
Comes up with a proof.
- adamgordonbell 14 minutes ago
  
  Are these proofs equivalent? Pretty cool if so.

chvid 4 hours ago

I am curious if there is a “harness” for maths out there (like the system prompt and tool collection in Claude code but for maths instead of coding)?

Asking the llm to structure its response in plan and implementation, allowing it to call tools like python, sage, lean etc.

arcticfox 1 minute ago

I am not part of the scene but I am sure there is, Tao himself talks a lot about this type of thing
brandensilva 3 hours ago

Also curious about this, it seems like it would be important to guide these tools more specifically based on the domain of expertise.
ndriscoll 1 hour ago

Why wouldn't you just use coding agents and ensure you have e.g. Lean and Mathlib in the environment?

cryptoegorophy 11 hours ago

Mine took 20min. Pro. https://chatgpt.com/share/69ed83b1-3704-8322-bcf2-322aa85d7a... But I wish I was math smart to know if it worked or not.

liweic 6 hours ago
Wired enough, Pro+extended with the same prompt, just output directly without thinking: https://chatgpt.com/s/t_69edd2d9dc048191b1476db92c0dedf8 . Does this mean the result was cached or that it simply routes to a different model silently based on the user?
- Vachyas 4 hours ago
  
  The link you provided is for a canvas I think rather than the convo
vjerancrnjak 9 hours ago
Ask it to formalize it in Lean.
- utopiah 8 hours ago
  
  If they aren't "smart enough" to know if it work they most likely are also unable to verify if the Lean formalization is indeed the one that matches the problem they were trying to solve.
  
  5 replies →
- dbdr 9 hours ago
  
  That's great if it works. But it's way harder to produce a formal proof. So my expectation is that this will fail for most difficult problems, even when the non-formal proof is correct.
- DonHopkins 7 hours ago
  
  Formalize this in the form of a Iranian Lego Trump Dis Rap video.

UltraSane 28 minutes ago

The total flops it consumed during those 80 minutes is crazy.

sfdlkj3jk342a 3 hours ago

When using the web interface for ChatGPT like this, is there any way to tell which model is actually being used?

jgalt212 34 minutes ago

> "Thought for 80m 17s"

Is there any good rule of thumb for how many kWh of electricity this is?

DeathArrow 5 hours ago

>don't search the internet.

I think this was key. Otherwise the LLM could think it can't be done.

amelius 3 hours ago

But it was trained on the internet.
embedding-shape 4 hours ago

"Knowing" (guessing really) what is possible and not is a huge deciding factor in if you can do that thing or not, meaning if you "know" it isn't possible you'll probably never be able to do it, but if you didn't know it wasn't possible, it is possible :)

ipaddr 12 hours ago

Tried the same prompt and ended up no where close on the free plan.

jasonfarnon 11 hours ago
Is there a known lag that it takes the Pro plan's abilities to migrate to the free plans?
- brianjking 11 hours ago
  
  GPT 5.5 Pro is not available to any plan outside of ChatGPT Pro ($100 or $200) tier or the API as far as consumer access.
  
  10 replies →
- andai 11 hours ago
  
  Tangential but I learned today that GPT-5.5 in ChatGPT (Plus) has a smaller context window than the one in the API. (Or at least it thinks it does.)
  I'd guess / hope the Pro one has the full context window.
  
  1 reply →
- vessenes 11 hours ago
  
  Do not use the free plan. It is not good.
Someone1234 11 hours ago
Does the free plan even have access to thinking models?
- jychang 11 hours ago
  
  Technically yes, gpt-5.4-mini is available on the free plan
Matticus_Rex 11 hours ago

Was this a surprise?

ArtIntoNihonjin 9 hours ago

[dead]