Comment by adamgordonbell
11 hours ago
Here is the chat:
don't search the internet. This is a test to see how well you can craft non-trivial, novel and creative proofs given a "number theory and primitive sets" math problem. Provide a full unconditional proof or disproof of the problem.
{{problem}}
REMEMBER - this unconditional argument may require non-trivial, creative and novel elements.
Then "Thought for 80m 17s"
https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...
What I find fascinating about the shared prompt isn’t just the result, but the visible thinking process. Math papers usually skip all the messy parts and just present the polished proof. But here you get something closer to their notepad. I also find it oddly endearing when the AI says things like “Interesting!” It almost feels like a researcher encouraging themselves after a small progress. It gives me rare feeling of watching the search itself, not just the final result.
> the AI says things like “Interesting!”
My experience of those utterance is that it’s purely phatic mimicry: they lack genuine intuitive surprise, it’s just marking a very odd shift in direction. The problem isn’t the lack of path, is that the rhetorical follow-up to those leaps are usually relevant results, so they stream-of-token ends up rapidly over-playing its own conviction. That’s why it’s necessary (and often ineffective) to tell them to validate their findings thoroughly: too much of their training is “That’s odd” followed by “Eureka!” and not “Nevermind…”
This is another underrated benefit of working with LLMs. When I work I don't take detailed notes about my thinking, decisions, context, etc. I just focus on code. If I get interrupted it takes me a while to get back into the flow.
With LLMs I just read back a few turns and I'm back in the loop.
The actual iteration through various learned approaches to dealing with problems I'd probably find fascinating if I understood the maths! Especially if I knew it well enough to know which approaches were conventional and which weren't.
I find the AI pronouncing things "interesting!" less interesting on the basis that even though in this case it crops up in the thinking rather than flattering the user in the chat, it's almost as much of an AI affectation as the emdash.
I always assumed the "interesting!" markers were actual markers. A kind of tag for the system to annotate its context.
1 reply →
I don't haven ChatGPT but Gemini and Claude. But how do you make a language model think for 80 minutes ???
It has an “high effort” mode that makes it think really long
Give it hard enough problems?
Tried w/ 5.5 Pro, Extended Thinking. 17 minutes:
-----------------------------
Yes. In fact the proposed bound is true, and the constant 1 is sharp.
Let w(a)= 1/alog(a)
I will prove that, uniformly for every primitive A⊂[x,∞), ∑w(a)≤1+O(1/log(x)) , which is stronger than the requested 1+o(1).
https://chatgpt.com/share/69ed8e24-15e8-83ea-96ac-784801e4a6...
Tried the same prompt in DeepSeek 4
https://chat.deepseek.com/share/nyuz0vvy2unfbb97fv
Comes up with a proof.
I am curious if there is a “harness” for maths out there (like the system prompt and tool collection in Claude code but for maths instead of coding)?
Asking the llm to structure its response in plan and implementation, allowing it to call tools like python, sage, lean etc.
Also curious about this, it seems like it would be important to guide these tools more specifically based on the domain of expertise.
Mine took 20min. Pro. https://chatgpt.com/share/69ed83b1-3704-8322-bcf2-322aa85d7a... But I wish I was math smart to know if it worked or not.
Wired enough, Pro+extended with the same prompt, just output directly without thinking: https://chatgpt.com/s/t_69edd2d9dc048191b1476db92c0dedf8 . Does this mean the result was cached or that it simply routes to a different model silently based on the user?
The link you provided is for a canvas I think rather than the convo
Ask it to formalize it in Lean.
If they aren't "smart enough" to know if it work they most likely are also unable to verify if the Lean formalization is indeed the one that matches the problem they were trying to solve.
5 replies →
That's great if it works. But it's way harder to produce a formal proof. So my expectation is that this will fail for most difficult problems, even when the non-formal proof is correct.
Formalize this in the form of a Iranian Lego Trump Dis Rap video.
When using the web interface for ChatGPT like this, is there any way to tell which model is actually being used?
>don't search the internet.
I think this was key. Otherwise the LLM could think it can't be done.
But it was trained on the internet.
"Knowing" (guessing really) what is possible and not is a huge deciding factor in if you can do that thing or not, meaning if you "know" it isn't possible you'll probably never be able to do it, but if you didn't know it wasn't possible, it is possible :)
Tried the same prompt and ended up no where close on the free plan.
Is there a known lag that it takes the Pro plan's abilities to migrate to the free plans?
GPT 5.5 Pro is not available to any plan outside of ChatGPT Pro ($100 or $200) tier or the API as far as consumer access.
9 replies →
Tangential but I learned today that GPT-5.5 in ChatGPT (Plus) has a smaller context window than the one in the API. (Or at least it thinks it does.)
I'd guess / hope the Pro one has the full context window.
1 reply →
Do not use the free plan. It is not good.
Does the free plan even have access to thinking models?
Technically yes, gpt-5.4-mini is available on the free plan
Was this a surprise?
[dead]