← Back to context

Comment by outlace

5 days ago

The headline may make it seem like AI just discovered some new result in physics all on its own, but reading the post, humans started off trying to solve some problem, it got complex, GPT simplified it and found a solution with the simpler representation. It took 12 hours for GPT pro to do this. In my experience LLM’s can make new things when they are some linear combination of existing things but I haven’t been to get them to do something totally out of distribution yet from first principles.

This is the critical bit (paraphrasing):

Humans have worked out the amplitudes for integer n up to n = 6 by hand, obtaining very complicated expressions, which correspond to a “Feynman diagram expansion” whose complexity grows superexponentially in n. But no one has been able to greatly reduce the complexity of these expressions, providing much simpler forms. And from these base cases, no one was then able to spot a pattern and posit a formula valid for all n. GPT did that.

Basically, they used GPT to refactor a formula and then generalize it for all n. Then verified it themselves.

I think this was all already figured out in 1986 though: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.56... see also https://en.wikipedia.org/wiki/MHV_amplitudes

  •   > I think this was all already figured out in 1986 though
    

    They cite that paper in the third paragraph...

      Naively, the n-gluon scattering amplitude involves order n! terms. Famously, for the special case of MHV (maximally helicity violating) tree amplitudes, Parke and Taylor [11] gave a simple and beautiful, closed-form, single-term expression for all n.
    

    It also seems to be a main talking point.

    I think this is a prime example of where it is easy to think something is solved when looking at things from a high level but making an erroneous conclusion due to lack of domain expertise. Classic "Reviewer 2" move. Though I'm not a domain expert and so if there was no novelty over Parke and Taylor I'm pretty sure this will get thrashed in review.

  • It bears repeating that modern LLMs are incredibly capable, and relentless, at solving problems that have a verification test suite. It seems like this problem did (at least for some finite subset of n)!

    This result, by itself, does not generalize to open-ended problems, though, whether in business or in research in general. Discovering the specification to build is often the majority of the battle. LLMs aren't bad at this, per se, but they're nowhere near as reliably groundbreaking as they are on verifiable problems.

    • > modern LLMs are incredibly capable, and relentless, at solving problems that have a verification test suite.

      Feel like it's a bit what I tried to expressed few weeks ago https://news.ycombinator.com/item?id=46791642 namely that we are just pouring computational resources at verifiable problems then claim that astonishingly sometimes it works. Sure LLMs even have a slight bias, namely they do rely on statistics so it's not purely brute force but still the approach is pretty much the same : throw stuff at the wall, see what sticks, once something finally does report it as grandiose and claim to be "intelligent".

      11 replies →

    • Yes, this is where I just cannot imagine completely AI-driven software development of anything novel and complicated without extensive human input. I'm currently working in a space where none of our data models are particularly complex, but the trick is all in defining the rules for how things should work.

      Our actual software implementation is usually pretty simple; often writing up the design spec takes significantly longer than building the software, because the software isn't the hard part - the requirements are. I suspect the same folks who are terrible at describing their problems are going to need help from expert folks who are somewhere between SWE, product manager, and interaction designer.

    • Even more generally than verification, just being tied to a loss function that represent something we actually care about. E.g. compiler and test errors, LEAN verification in Aristotle, basic physics energy configs in AlphaFold, or win conditions in e.g. RL, such as in AlphaGo.

      RLHF is an attempt to push LLMs pre-trained with a dopey reconstruction loss toward something we actually care about: imagine if we could find a pre-training criterion that actually cared about truth and/or plausibility in the first place!

      1 reply →

  • That paper from the 80s (which is cited in the new one) is about "MHV amplitudes" with two negative-helicity gluons, so "double-minus amplitudes". The main significance of this new paper is to point out that "single-minus amplitudes" which had previously been thought to vanish are actually nontrivial. Moreover, GPT-5.2 Pro computed a simple formula for the single-minus amplitudes that is the analogue of the Parke-Taylor formula for the double-minus "MHV" amplitudes.

  • You should probably email the authors if you think that's true. I highly doubt they didn't do a literature search first though...

    • You should be more skeptical of marketing releases like this. This is an advertisement.

    • It's hard to get someone to do literature first when they get free publicity by not doing literature search and claiming some major AI assisted breakthrough...

      Heck, it's hard to get authors to do literature search, period: never mind not thoroughly looking for prior art, even well known disgraced papers get citated continue to get possitive citations all the time...

  • > But no one has been able to greatly reduce the complexity of these expressions, providing much simpler forms.

    Slightly OT, but wasn't this supposed to be largely solved with amplituhedrons?

  • Still pretty awesome though, if you ask me.

    • I think even “non-intelligent” solver like Mathematica is cool - so hell yes, this is cool.

    • Big difference between “derives new result” and “reproduces something likely in its training dataset”.

  • Sounds somehow similar to the groundbreaking application of a computer to prove the 4 color theorem. Then the researchers wrote a program to find and formally prove the numerous particular cases. Here the computer finds a simplifying pattern.

  • I'm not sure if GPTs ability goes beyond a formal math package's in this regard or its just its just way more convienient to ask ChatGPT rather than using these software.

> but I haven’t been to get them to do something totally out of distribution yet from first principles

Can humans actually do that? Sometimes it appears as if we have made a completely new discovery. However, if you look more closely, you will find that many events and developments led up to this breakthrough, and that it is actually an improvement on something that already existed. We are always building on the shoulders of giants.

  • > Can humans actually do that?

    From my reading yes, but I think I am likely reading the statement differently than you are.

    > from first principles

    Doing things from first principles is a known strategy, so is guess and check, brute force search, and so on.

    For an llm to follow a first principles strategy I would expect it to take in a body of research, come up with some first principles or guess at them, then iteratively construct and tower of reasonings/findings/experiments.

    Constructing a solid tower is where things are currently improving for existing models in my mind, but when I try openai or anthropic chat interface neither do a good job for long, not independently at least.

    Humans also often have a hard time with this in general it is not a skill that everyone has and I think you can be a successful scientist without ever heavily developing first principles problem solving.

    • "Constructing a solid tower" from first principles is already super-human level. Sure, you can theorize a tower (sans the "solid") from first principles; there's a software architect at my job that does it every day. But the "solid" bit is where things get tricky, because "solid" implies "firm" and "well anchored", and that implies experimental grounds, experimental verification all the way, and final measurable impact. And I'm not even talking particle physics or software engineering; even folding a piece of paper can give you surprising mismatches between theory and results.

      Even the realm of pure mathematics and elegant physic theories, where you are supposed to take a set of axioms ("first principles") and build something with it, has cautionary tales such as the Russel paradox or the non-measure of Feymann path integrals, and let's not talk about string theory.

  • Yes. Thats how all advancement in human knowledge happened. Small and incremental forays out of our training distribution.

    These have been identified as various things. Eureka moments, strokes of genius, out of the box thinking, lateral thinking.

    LLMs have not shown to be capable of this. They might be in the future, but they havent yet

  • Relativity comes to mind.

    You could nitpick a rebuttal, but no matter how many people you give credit, general relativity was a completely novel idea when it was proposed. I'd argue for special relatively as well.

    • I am not a scientific historian, or even a physicist, but IMO relativity has a weak case for being a completely novel discovery. Critique of absolute time and space of Newtonian physics was already well underway, and much of the methodology for exploring this relativity (by way of gyroscopes, inertial reference frames, and synchronized mechanical clocks) were already in parlance. Many of the phenomena that relativity would later explain under a consistent framework already had independent quasi-explanations hinting at the more universal theory. Poincare probably came the closest to unifying everything before Einstein:

      > In 1902, Henri Poincaré published a collection of essays titled Science and Hypothesis, which included: detailed philosophical discussions on the relativity of space and time; the conventionality of distant simultaneity; the conjecture that a violation of the relativity principle can never be detected; the possible non-existence of the aether, together with some arguments supporting the aether; and many remarks on non-Euclidean vs. Euclidean geometry.

      https://en.wikipedia.org/wiki/History_of_special_relativity

      Now, if I had to pick a major idea that seemed to drop fully-formed from the mind of a genius with little precedent to have guided him, I might personally point to Galois theory (https://en.wikipedia.org/wiki/Galois_theory). (Ironically, though, I'm not as familiar with the mathematical history of that time and I may be totally wrong!)

      12 replies →

    • Agreed.

      General relativity was a completely novel idea. Einstein took a purely mathematical object (now known as the Einstein tensor), and realized that since its coveriant derivative was zero, it could be equated (apart fron a constant factor) to a conserved physical object, the energy momentum tensor (except for a constant factor). It didn't just fall out of Riemannian geometry and what was known about physics at the time.

      Special relativity was the work of several scientists as well as Einstein, but it was also a completely novel idea - just not the idea of one person working alone.

      I don't know why anyone disputes that people can sometimes come up with completely novel ideas out of the blue. This is how science moves forward. It's very easy to look back on a breakthrough and think it looks obvious (because you know the trick that was used), but it's important to remember that the discoverer didn't have the benefit of hindsight that you have.

    • Even if I grant you that, surely we’ve moved the goal posts a bit if we’re saying the only thing we can think of that AI can’t do is the life’s work of a man who’s last name is literally synonymous with genius.

    • Not really. Pretty sure I read recently that Newton appreciated that his theory was non-local and didn't like what Einstein later called "spooky action at a distance". The Lorentz transform was also known from 1887. Time dilation was understood from 1900. Poincaré figured out in 1905 that it was a mathematical group. Einstein put a bow on it all by figuring out that you could derive it from the principle of relativity and keeping the speed of light constant in all inertial reference frames.

      I'm not sure about GR, but I know that it is built on the foundations of differential geometry, which Einstein definitely didn't invent (I think that's the source of his "I assure you whatever your difficulties in mathematics are, that mine are much greater" quote because he was struggling to understand Hilbert's math).

      And really Cauchy, Hilbert, and those kinds of mathematicians I'd put above Einstein in building entirely new worlds of mathematics...

      4 replies →

  • Depends on what you think is valid.

    The process you’re describing is humans extending our collective distribution through a series of smaller steps. That’s what the “shoulders of giants” means. The result is we are able to do things further and further outside the initial distribution.

    So it depends on if you’re comparing individual steps or just the starting/ending distributions.

  • Go enough shoulders down, and someone had to have been the first giant.

    • A discovery by a giant is in some sense a new base vector in the space of discoveries. The interesting question is if a statistical machine can only perform a linear combination in the space of discoveries, or if a statistical machine can discover a new base vector in the space of discoveries.. whatever that is.

      1 reply →

  • Arguably it's precisely a paradigm shift. Continuing whatever worked until now is within the paradigm, our current theories and tools works, we find few problems that don't fit but that's fine the rest is still progress, we keep on hitting more problems or those few pesky unsolved problems actually appear to be important. We then go back to the theory and its foundations and finally challenge them. We break from the old paradigm and come up with new theories and tools because the first principles are now better understood and we iterate.

    So that's actually 2 different regimes on how to proceed. Both are useful but arguably breaking off of the current paradigm is much harder and thus rare.

  • The tricky part is that LLMs aren't just spewing outputs from the distribution (or "near" learned manifolds), but also extrapolating / interpolating (depending on how much you care about the semantics of these terms https://arxiv.org/abs/2110.09485).

    There are genuine creative insights that come from connecting two known semantic spaces in a way that wasn't obvious before (e.g, novel isomorphism). It is very conceivable that LLMs could make this kind of connection, but we haven't really seen a dramatic form of this yet. This kind of connection can lead to deep, non-trivial insights, but whether or not it is "out-of-distribution" is harder to answer in this case.

  • I mean, there’s just no way you can take the set of publicly known ideas from all human civilizations, say, 5,000 years ago, and say that all the ideas we have now were “in the distribution” then. New ideas actually have to be created.

  •   > Can humans actually do that? 
    

    Yes

    Seriously, think about it for a second...

    If that were true then science should have accelerated a lot faster. Science would have happened differently and researchers would have optimized to trying to ingest as many papers as they can.

    Dig deep into things and you'll find that there are often leaps of faith that need to be made. Guesses, hunches, and outright conjectures. Remember, there are paradigm shifts that happen. There are plenty of things in physics (including classical) that cannot be determined from observation alone. Or more accurately, cannot be differentiated from alternative hypotheses through observation alone.

    I think the problem is when teaching science we generally teach it very linearly. As if things easily follow. But in reality there is generally constant iterative improvements but they more look like a plateau, then there are these leaps. They happen for a variety of reasons but no paradigm shift would be contentious if it was obvious and clearly in distribution. It would always be met with the same response that typical iterative improvements are met with "well that's obvious, is this even novel enough to be published? Everybody already knew this" (hell, look at the response to the top comment and my reply... that's classic "Reviewer #2" behavior). If it was always in distribution progress would be nearly frictionless. Again, with history in how we teach science we make an error in teaching things like Galileo, as if The Church was the only opposition. There were many scientists that objected, and on reasonable grounds. It is also a problem we continually make in how we view the world. If you're sticking with "it works" you'll end up with a geocentric model rather than a heliocentric model. It is true that the geocentric model had limits but so did the original heliocentric model and that's the reason it took time to be adopted.

    By viewing things at too high of a level we often fool ourselves. While I'm criticizing how we teach I'll also admit it is a tough thing to balance. It is difficult to get nuanced and in teaching we must be time effective and cover a lot of material. But I think it is important to teach the history of science so that people better understand how it actually evolves and how discoveries were actually made. Without that it is hard to learn how to actually do those things yourself, and this is a frequent problem faced by many who enter PhD programs (and beyond).

      > We are always building on the shoulders of giants.
    

    And it still is. You can still lean on others while presenting things that are highly novel. These are not in disagreement.

    It's probably worth reading The Unreasonable Effectiveness of Mathematics in the Natural Sciences. It might seem obvious now but read carefully. If you truly think it is obvious that you can sit in a room armed with only pen and paper and make accurate predictions about the world, you have fooled yourself. You have not questioned why this is true. You have not questioned when this actually became true. You have not questioned how this could be true.

    https://www.hep.upenn.edu/~johnda/Papers/wignerUnreasonableE...

      You are greater than the sum of your parts

When chess engines were first developed, they were strictly worse than the best humans. After many years of development, they became helpful to even the best humans even though they were still beatable (1985–1997). Eventually they caught up and surpassed humans but the combination of human and computer was better than either alone (~1997–2007). Since then, humans have been more or less obsoleted in the game of chess.

Five years ago we were at Stage 1 with LLMs with regard to knowledge work. A few years later we hit Stage 2. We are currently somewhere between Stage 2 and Stage 3 for an extremely high percentage of knowledge work. Stage 4 will come, and I would wager it's sooner rather than later.

  • There's a major difference between chess and scientific research: setting the objectives is itself part of the work.

    In chess, there's a clear goal: beat the game according to this set of unambiguous rules.

    In science, the goals are much more diffuse, and setting those in the first place is what makes a scientist more or less successful, not so much technical ability. It's a very hierarchical field where permanent researchers direct staff (postdocs, research scientists/engineers), direct grad students. And it's at the bottom of the pyramid where the technical ability is the most relevant/rewarded.

    Research is very much a social game, and I think replacing it with something run by LLMs (or other automatic process) is much more than a technical challenge.

  • The evolution was also interesting: first the engines were amazing tactically but pretty bad strategically so humans could guide them. With new NN based engines they were amazing strategically but they sucked tactically (first versions of Leela Chess Zero). Today they closed the gap and are amazing at both strategy and tactics and there is nothing humans can contribute anymore - all that is left is to just watch and learn.

  • With a chess engine, you could ask any practitioner in the 90's what it would take to achieve "Stage 4" and they could estimate it quite accurately as a function of FLOPs and memory bandwidth. It's worth keeping in mind just how little we understand about LLM capability scaling. Ask 10 different AI researchers when we will get to Stage 4 for something like programming and you'll get wild guesses or an honest "we don't know".

    • That is not what happened with chess engines. We didn’t just throw better hardware at it, we found new algorithms, improved the accuracy and performance of our position evaluation functions, discovered more efficient data structures, etc.

      People have been downplaying LLMs since the first AI-generated buzzword garbage scientific paper made its way past peer review and into publication. And yet they keep getting better and better to the point where people are quite literally building projects with shockingly little human supervision.

      By all means, keep betting against them.

    • Chess grandmasters are living proof that it’s possible to reach grandmaster level in chess on 20W of compute. We’ve got orders of magnitude of optimizations to discover in LLMs and/or future architectures, both software and hardware and with the amount of progress we’ve got basically every month those ten people will answer ‘we don’t know, but it won’t be too long’. Of course they may be wrong, but the trend line is clear; Moore’s law faced similar issues and they were successively overcome for half a century.

      IOW respect the trend line.

    • And their predictions about Go were wrong, because they thought the algorithm would forever be α-β pruning with a weak value heuristic

    • > With a chess engine, you could ask any practitioner in the 90's what it would take to achieve "Stage 4" and they could estimate it quite accurately as a function of FLOPs and memory bandwidth.

      And the same practitioners said right after deep blue that go is NEVER gonna happen. Too large. The search space is just not computable. We'll never do it. And yeeeet...

  • We are already at stage 3 for software development and arguably step 4

    • We are at level 2.5 for software development, IMO. There is a clear skill gap between experienced humans and LLMs when it comes to writing maintainable, robust, concise and performant code and balancing those concerns.

      The LLMs are very fast but the code they generate is low quality. Their comprehension of the code is usually good but sometimes they have a weightfart and miss some obvious detail and need to be put on the right path again. This makes them good for non-experienced humans who want to write code and for experienced humans who want to save time on easy tasks.

      1 reply →

I don't want to be rude but like, maybe you should pre-register some statement like "LLMs will not be able to do X" in some concrete domain, because I suspect your goalposts are shifting without you noticing.

We're talking about significant contributions to theoretical physics. You can nitpick but honestly go back to your expectations 4 years ago and think — would I be pretty surprised and impressed if an AI could do this? The answer is obviously yes, I don't really care whether you have a selective memory of that time.

  • I don't know enought about theoretical physics: what makes it a significant contribution there?

    • It's a nontrivial calculation valid for a class of forces (e.g. QCD) and apparently a serious simplification to a specific calculation that hadn't been completed before. But for what it's worth, I spent a good part of my physics career working in nucleon structure and have not run across the term "single minus amplitudes" in my memory. That doesn't necessarily mean much as there's a very broad space work like this takes place in and some of it gets extremely arcane and technical.

      One way I gauge the significance of a theory paper are the measured quantities and physical processes it would contribute to. I see none discussed here which should tell you how deep into math it is. I personally would not have stopped to read it on my arxiv catch-up

      https://arxiv.org/list/hep-th/new

      Maybe to characterize it better, physicists were not holding their breath waiting for this to get done.

      1 reply →

  • I never said LLMs will not be able to do X. I gave my summary of the article and my anecdotal experiences with LLMs. I have no LLM ideology. We will see what tomorrow brings.

  • > We're talking about significant contributions to theoretical physics.

    Whoever wrote the prompts and guided ChatGPT made significant contributions to theoretical physics. ChatGPT is just a tool they used to get there. I'm sure AI-bloviators and pelican bike-enjoyers are all quite impressed, but the humans should be getting the research credit for using their tools correctly. Let's not pretend the calculator doing its job as a calculator at the behest of the researcher is actually a researcher as well.

    • If this worked for 12 hours to derive the simplified formula along with its proof then it guided itself and made significant contributions by any useful definition of the word, hence Open AI having an author credit.

      15 replies →

    • If a helicopter drops someone off on the top of Mount Everest, it's reasonable to say that the helicopter did the work and is not just a tool they used to hike up the mountain.

      5 replies →

"GPT did this". Authored by Guevara (Institute for Advanced Study), Lupsasca (Vanderbilt University), Skinner (University of Cambridge), and Strominger (Harvard University).

Probably not something that the average GI Joe would be able to prompt their way to...

I am skeptical until they show the chat log leading up to the conjecture and proof.

  • I'm a big LLM sceptic but that's… moving the goalposts a little too far. How could an average Joe even understand the conjecture enough to write the initial prompt? Or do you mean that experts would give him the prompt to copy-paste, and hope that the proverbial monkey can come up with a Henry V? At the very least posit someone like a grad student in particle physics as the human user.

    • I would interpret it as implying that the result was due to a lot more hand-holding that what is let on.

      Was the initial conjecture based on leading info from the other authors or was it simply the authors presenting all information and asking for a conjecture?

      Did the authors know that there was a simpler means of expressing the conjecture and lead GPT to its conclusion, or did it spontaneously do so on its own after seeing the hand-written expressions.

      These aren't my personal views, but there is some handwaving about the process in such a way that reads as if this was all spontaneous involvement on GPTs end.

      But regardless, a result is a result so I'm content with it.

      8 replies →

    • That's kinda the whole point.

      SpaceX can use an optimization algorithm to hoverslam a rocket booster, but the optimization algorithm didn't really figure it out on its own.

      The optimization algorithm was used by human experts to solve the problem.

      1 reply →

  • "Grad Student did this". Co-authored by <Famous advisor 1>, <Famous advisor 2>, <Famous advisor 3>.

    Is this so different?

  • The Average Joe reads at an 8th grade level. 21% are illiterate in the US.

    LLMs surpassed the average human a long time ago IMO. When LLMs fail to measure up to humans, it's that they fail to measure up against human experts in a given field, not the Average Joe.

    We are surrounded by NPCs.

  • The paper has all those prominent institutions who acknowledge the contribution so realistically, why would you be skeptical ?

    • they probably also acknowledge pytorch, numpy, R ... but we don't attribute those tools as the agent who did the work.

      I know we've been primed by sci-fi movies and comic books, but like pytorch, gpt-5.2 is just a piece of software running on a computer instrumented by humans.

      6 replies →

    • Their point is, would you be able to prompt your way to this result? No. Already trained physicists working at world-leading institutions could. So what progress have we really made here?

      7 replies →

> In my experience LLM’s can make new things when they are some linear combination of existing things but I haven’t been to get them to do something totally out of distribution yet from first principles.

What's the distinction between "first principles" and "existing things"?

I'm sympathetic to the idea that LLMs can't produce path-breaking results, but I think that's true only for a strict definition of path-breaking (that is quite rare for humnans too).

Hmm feels a bit trivializing, we don't know exactly how difficult it was to come up with the generic set of equations mentioned from the human starting point.

I can claim some knowledge of physics from my degree, typically the easy part is coming up with complex dirty equations that work under special conditions, the hard part is the simplification into something elegant, 'natural' and general.

Also "LLM’s can make new things when they are some linear combination of existing things"

Doesn't really mean much, what is a linear combination of things you first have to define precisely what a thing is?

Very very few human individuals are capable of making new things that are not a linear combination of existing things. Even such things as special relativity were an application of two previous ideas. All of special relativity is deriveable from the principles of relative motion (known into antiquity) and the constant speed of light (which was known to Einstein). From there it is a straightforwards application of the Pythagorean theorem to realize there is a contradiction and the lorentz factor falls out naturally via basic algebra.

Serious questions, I often hear about this "let the LLM cook for hours" but how do you do that in practice and how does it manages its own context? How doesn't it get lost at all after so many tokens?

  • I’m guessing, would love someone who has first hand knowledge to comment. But my guess is it’s some combination of trying many different approaches in parallel (each in a fresh context), then picking the one that works, and splitting up the task into sequential steps, where the output of one step is condensed and is used as an input to the next step (with possibly human steering between steps)

  • From what I've seen is a process of compacting the session once it reaches some limit, which basically means summarizing all the previous work and feeding it as the initial prompt for the next session.

  • the annoying part is that with tool calls, a lot of those hours is time spent on netowrk round trips.

    over long periods of time, checklists are the biggest thing, so the LLM can track whats already done and whats left. after a compact, it can pull the relevant stuff back up and make progress.

    having some level or hierarchy is also useful - requirements, high level designs, low level designs, etc

> I haven’t been to get them to do something totally out of distribution yet from first principles.

Agree with this. I’ve been trying to make LLMs come up with creative and unique word games like Wordle and Uncrossy (uncrossy.com), but so far GPT-5.2 has been disappointing. Comparatively, Opus 4.5 has been doing better on this.

But it’s good to know that it’s breaking new ground in Theoretical Physics!

What does a 12-hour solution cost an OpenAI customer?

  • $200/month would cover many such sessions every month.

    The real question is, what does it cost OpenAI? I'm pretty sure both their plans are well below cost, at least for users who max them out (and if you pay $200 for something then you'll probably do that!). How long before the money runs out? Can they get it cheap enough to be profitable at this price level, or is this going to be "get them addicted then jack it up" kind of strategy?

    • No because open source models are close behind

      Compute costs will fall drastically for existing models

      But it's likely that frontier models of the future won't be released to the public at all, because they'll be too good

      1 reply →

Surely higher level math is just linear combinations of the syntax and implications of lower level math. LLMs are taught syntax of basically all existing math notation, I assume. Much of math is, after all, just linguistic manipulation and detection of contradiction in said language with a more formal, a priori language.

  • LLMs can write theorems, but can they come up with meaningful definitions?

    • I intended to imply this with "detection of contradiction". Coherence seems to me to be the only a priori meaning. Most of the meaning of "meaning" seems to me to be a posteriori. After all, what is the point of an a priori floating signifier?

      1 reply →

> In my experience LLM’s can make new things when they are some linear combination of existing things

It seems to me that all “new ideas” are basically linear combinations of existing things with exceeding rare exceptions…

Maybe Godel’s Incompleteness?

Darwinian evolution?

General Relativity?

Buddhist non-duality?

My physics professor once claimed that imagination is just mental manipulation of past experiences. I never thought it was true for human beings but for LLMs it makes perfect sense.

I must be a Luddite, how do you have a model working for 12 hours on a problem. Mine is ready with an answer and always interrupts to ask confirmation or show answer

  • That's on the harness - the device actually sending the prompt to the model. You can write a different harness that feeds the problem back in for however long you want. Ask Claude Code or Codex to build it for you in as minimal a fashion as possible and you'll see that a naïve version is not particularly more complex than `while true; do prompt $file >> file; done` (though it's not that precisely, obviously).

>LLM’s can make new things when they are some linear combination of existing things

Aren't most new things linear combinations of existing things (up to a point)?

> It took 12 hours for GPT pro to do this

Thanks for the summary; but this is a huge hand-wave. was GPT Pro just spinning for 12 hours and returend 42?!

AI cough LLMs don't discover things they simply surface information that already existed.

  • You're assuming there aren't "new things" latent inside currently existing information. That's definitely false, particulary for math/physics.

    But it's worth thinking more about this. What gives humans the ability to discover "new things"? I would say it's due to our interaction with the universe via our senses, and not due to some special powers intrinsic to our brains that LLMs lack. And the thing is, we can feed novel measurements to LLMs (or, eventually, hook them up to camera feeds to "give them senses")

    • No it isn't false. If it is new it is novel, novel because it is known to some degree and two other abstracted known things prove the third. Just pattern matching connecting dots.

      1 reply →

Is every new thing not just combinations of existing things? What does out of distribution even mean? What advancement has ever made that there wasn’t a lead up of prior work to it? Is there some fundamental thing that prevents AI from recombining ideas and testing theories?

  • For example, ever since the first GPT 4 I’ve tried to get LLM’s to build me a specific type of heart simulation that to my knowledge does not exist anywhere on the public internet (otherwise I wouldn’t try to build it myself) and even up to GPT 5.3 it still cannot do it.

    But I’ve successfully made it build me a great Poker training app, a specific form that also didn’t exist, but the ingredients are well represented on the internet.

    And I’m not trying to imply AI is inherently incapable, it’s just an empirical (and anecdotal) observation for me. Maybe tomorrow it’ll figure it out. I have no dogmatic ideology on the matter.

  • > Is every new thing not just combinations of existing things?

    If all ideas are recombinations of old ideas, where did the first ideas come from? And wouldn't the complexity of ideas be thus limited to the combined complexity of the "seed" ideas?

    I think it's more fair to say that recombining ideas is an efficient way to quickly explore a very complex, hyperdimensional space. In some cases that's enough to land on new, useful ideas, but not always. A) the new, useful idea might be _near_ the area you land on, but not exactly at. B) there are whole classes of new, useful ideas that cannot be reached by any combination of existing "idea vectors".

    Therefore there is still the necessity to explore the space manually, even if you're using these idea vectors to give you starting points to explore from.

    All this to say: Every new thing is a combination of existing things + sweat and tears.

    The question everyone has is, are current LLMs capable of the latter component. Historically the answer is _no_, because they had no real capacity to iterate. Without iteration you cannot explore. But now that they can reliably iterate, and to some extent plan their iterations, we are starting to see their first meaningful, fledgling attempts at the "sweat and tears" part of building new ideas.

    • Well, what exactly an “idea” is might be a little unclear, but I don’t think it clear that the complexity of ideas that result from combining previously obtained ideas would be bounded by the complexity of the ideas they are combinations of.

      Any countable group is a quotient of a subgroup of the free group on two elements, iirc.

      There’s also the concept of “semantic primes”. Here is a not-quite correct oversimplification of the idea: Suppose you go through the dictionary and one word at a time pick a word whose definition includes only other words that are still in the dictionary, and removing them. You can also rephrase definitions before doing this, as long as it keeps the same meaning. Suppose you do this with the goal of leaving as few words in it as you can. In the end, you should have a small cluster of a bit over 100 words, in terms of which all the other words you removed can be indirectly defined. (The idea of semantic primes also says that there is such a minimal set which translates essentially directly* between different natural languages.)

      I don’t think that says that words for complicated ideas aren’t like, more complicated?

    • >If all ideas are recombinations of old ideas, where did the first ideas come from?

      Ideas seem to just be our abstractions of neural impulses from deep in evolution.

  • > What does out of distribution even mean?

    There are in fact ways to directly quantify this, if you are training e.g. a self-supervised anomaly-detection model.

    Even with modern models not trained in that manner, looking at e.g. cosine distances of embeddings of "novel" outputs could conceivably provide objective evidence for "out-of-distribution" results. Generally, the embeddings of out-of-distribution outputs will have a large cosine (or even Euclidean) distance from the typical embedding(s). Just, most "out-of-distribution" outputs will be nonsense / junk, so, searching for weird outputs isn't really helpful, in general, if your goal is useful creativity.

Just wait until LLMs are fast and cheap enough to be run in a breadth first search kind of way, with "fuzzy" pruning.

My issue with any of these claims is the lack of proof. Just share the chat and now it got to the discovery. I'll believe it when I can see it for myself at this point. It's too easy to make all sorts of claims without proof these days. Elon Musk makes them all the time.