Comment by randomtoast
6 days ago
> but I haven’t been to get them to do something totally out of distribution yet from first principles
Can humans actually do that? Sometimes it appears as if we have made a completely new discovery. However, if you look more closely, you will find that many events and developments led up to this breakthrough, and that it is actually an improvement on something that already existed. We are always building on the shoulders of giants.
> Can humans actually do that?
From my reading yes, but I think I am likely reading the statement differently than you are.
> from first principles
Doing things from first principles is a known strategy, so is guess and check, brute force search, and so on.
For an llm to follow a first principles strategy I would expect it to take in a body of research, come up with some first principles or guess at them, then iteratively construct and tower of reasonings/findings/experiments.
Constructing a solid tower is where things are currently improving for existing models in my mind, but when I try openai or anthropic chat interface neither do a good job for long, not independently at least.
Humans also often have a hard time with this in general it is not a skill that everyone has and I think you can be a successful scientist without ever heavily developing first principles problem solving.
"Constructing a solid tower" from first principles is already super-human level. Sure, you can theorize a tower (sans the "solid") from first principles; there's a software architect at my job that does it every day. But the "solid" bit is where things get tricky, because "solid" implies "firm" and "well anchored", and that implies experimental grounds, experimental verification all the way, and final measurable impact. And I'm not even talking particle physics or software engineering; even folding a piece of paper can give you surprising mismatches between theory and results.
Even the realm of pure mathematics and elegant physic theories, where you are supposed to take a set of axioms ("first principles") and build something with it, has cautionary tales such as the Russel paradox or the non-measure of Feymann path integrals, and let's not talk about string theory.
Yes. Thats how all advancement in human knowledge happened. Small and incremental forays out of our training distribution.
These have been identified as various things. Eureka moments, strokes of genius, out of the box thinking, lateral thinking.
LLMs have not shown to be capable of this. They might be in the future, but they havent yet
Relativity comes to mind.
You could nitpick a rebuttal, but no matter how many people you give credit, general relativity was a completely novel idea when it was proposed. I'd argue for special relatively as well.
I am not a scientific historian, or even a physicist, but IMO relativity has a weak case for being a completely novel discovery. Critique of absolute time and space of Newtonian physics was already well underway, and much of the methodology for exploring this relativity (by way of gyroscopes, inertial reference frames, and synchronized mechanical clocks) were already in parlance. Many of the phenomena that relativity would later explain under a consistent framework already had independent quasi-explanations hinting at the more universal theory. Poincare probably came the closest to unifying everything before Einstein:
> In 1902, Henri Poincaré published a collection of essays titled Science and Hypothesis, which included: detailed philosophical discussions on the relativity of space and time; the conventionality of distant simultaneity; the conjecture that a violation of the relativity principle can never be detected; the possible non-existence of the aether, together with some arguments supporting the aether; and many remarks on non-Euclidean vs. Euclidean geometry.
https://en.wikipedia.org/wiki/History_of_special_relativity
Now, if I had to pick a major idea that seemed to drop fully-formed from the mind of a genius with little precedent to have guided him, I might personally point to Galois theory (https://en.wikipedia.org/wiki/Galois_theory). (Ironically, though, I'm not as familiar with the mathematical history of that time and I may be totally wrong!)
Right on with special relativity—Lorentz also was developing the theory and was a bit sour that Einstein got so much credit. Einstein basically said “what if special relativity were true for all of physics”, not just electromagnetism, and out dropped e=mc^2. It was a bold step but not unexplainable.
As for general relativity, he spent several years working to learn differential geometry (which was well developed mathematics at the time, but looked like abstract nonsense to most physicists). I’m not sure how he was turned on to this theory being applicable to gravity, but my guess is that it was motivated by some symmetry ideas. (It always come down to symmetry.)
1 reply →
This only means Einstein was not alone, it does not mean the results were in distribution.
And this comes about because people are looking at edge cases and trying to solve things. Sometimes people come up with wild and crazy solutions. Sometimes those solutions look obvious after they're known (though not prior to being known, otherwise it would have already been known...) and others don't.
Your argument really makes the claim that since there are others pursuing similar directions that this means it is in distribution. I'll use a classic statistics style framing. Suppose we have a bag with n red balls and p blue balls. Someone walks over and says "look, I have a green ball" and someone else walks over and says "I have a purple one" and someone else comes over and says "I have a pink one!". None of those balls were from the bag we have. There are still n+p balls in our bag, they are still all red or blue despite there being n+p+3 balls that we know of.
I think this is probably why you don't have the resolution to see the distinctions. Without a formal study of physics it is really hard to differentiate these kinds of propositions. It can be very hard even with that education. So be careful to not overly abstract and simplify concepts. It'll only deprive you of a lot of beauty and innovation.
2 replies →
From that article:
> The quintic was almost proven to have no general solutions by radicals by Paolo Ruffini in 1799, whose key insight was to use permutation groups, not just a single permutation.
Thing is, I am usually the kind of person who defends the idea of a lone genius. But I also believe there is a continuous spectrum, no gaps, from the village idiot to Einstein and beyond.
Let me introduce, just for fun, not for the sake of any argument, another idea from math which I think it came really out of the blue, to the degree that it's still considered an open problem to write an exposition about it, since you cannot smoothly link it to anything else: forcing.
At least Einstein didn't just suddenly turn around and say:
```ai-slop
But wait, this equation is too simple, I need to add more terms or it won't model the universe. Let me think about this again. I have 5 equations and I combined them and derived e=mc^2 but this is too simple. The universe is more complicated. Let's try a different derivation. I'll delete the wrong outputs first and then start from the input equations.
<Deletes files with groundbreaking discovery>
Let me think. I need to re-read the original equations and derive a more complex formula that describes the universe.
<Re-reads equation files>
Great, now I have the complete picture of what I need to do. Let me plan my approach. I'm ready. I have a detailed plan. Let me check some things first.
I need to read some extra files to understand what the variables are.
<Reads the lunch menu for the next day>
Perfect. Now I understand the problem fully, let me revise my plan.
<Writes plan file>
Okay I have written the plan. Do you accept?
<Yes>
Let's go. I'll start by creating a To Do list:
- [ ] Derive new equation from first principles making sure it's complex enough to describe reality.
- [ ] Go for lunch. When the server offers tuna, reject it because the notes say I don't like fish.
```
(You know what's really sad? I wrote that slop without using AI and without referring to anything...)
1 reply →
You need to differentiate between special and general relativity when making these statements.
It is absolutely true that someone else would have come up with special relativity very soon after Einstein. All that would be necessary is someone else to have the wherewithal to say "perhaps the aether does not need to exist" for the equations already known at the time by others before Einstein to lead to the general theory.
General relativity is different. Witten contends that it is entirely possible that without Einstein, we may have had to wait for the early string theorists of the 1960s to discover GR as a classical limit of the first string theories in their quest to understand the strong nuclear force.
As opposed to SR, GR is one of the most singular innovative intellectual achievements in human history. It's definitely "out of distribution" in some sense.
In my view, another example would be Gautama Buddha, with Dependent Origination. It’s basically a super early realisation of Process Philosophy.
https://en.wikipedia.org/wiki/Prat%C4%ABtyasamutp%C4%81da https://iep.utm.edu/processp/
Edit: but even it likely relied on his prior experience with nondualistic Hinduisms, of course.
Newton himself wrote that we usually deal with relative space and time, but we can imagine absolute time and space.
1 reply →
Agreed.
General relativity was a completely novel idea. Einstein took a purely mathematical object (now known as the Einstein tensor), and realized that since its coveriant derivative was zero, it could be equated (apart fron a constant factor) to a conserved physical object, the energy momentum tensor (except for a constant factor). It didn't just fall out of Riemannian geometry and what was known about physics at the time.
Special relativity was the work of several scientists as well as Einstein, but it was also a completely novel idea - just not the idea of one person working alone.
I don't know why anyone disputes that people can sometimes come up with completely novel ideas out of the blue. This is how science moves forward. It's very easy to look back on a breakthrough and think it looks obvious (because you know the trick that was used), but it's important to remember that the discoverer didn't have the benefit of hindsight that you have.
Even if I grant you that, surely we’ve moved the goal posts a bit if we’re saying the only thing we can think of that AI can’t do is the life’s work of a man who’s last name is literally synonymous with genius.
That's not exactly true. Lorentz contraction is a clear antecedent to special relativity.
It isn't an anteceent, it's part of special relativity, discovered by Lorentz. It's well known that special relativity is the work of several people as well as Einstein.
Not really. Pretty sure I read recently that Newton appreciated that his theory was non-local and didn't like what Einstein later called "spooky action at a distance". The Lorentz transform was also known from 1887. Time dilation was understood from 1900. Poincaré figured out in 1905 that it was a mathematical group. Einstein put a bow on it all by figuring out that you could derive it from the principle of relativity and keeping the speed of light constant in all inertial reference frames.
I'm not sure about GR, but I know that it is built on the foundations of differential geometry, which Einstein definitely didn't invent (I think that's the source of his "I assure you whatever your difficulties in mathematics are, that mine are much greater" quote because he was struggling to understand Hilbert's math).
And really Cauchy, Hilbert, and those kinds of mathematicians I'd put above Einstein in building entirely new worlds of mathematics...
Agree with you everywhere. Although I prefer the quote:
"Since the mathematicians have invaded the theory of relativity, I do not understand it myself anymore."
:)
Are you saying Newton was aware of quantum entanglement? Because that's what the "spooky action at a distance" quote refers to.
2 replies →
Depends on what you think is valid.
The process you’re describing is humans extending our collective distribution through a series of smaller steps. That’s what the “shoulders of giants” means. The result is we are able to do things further and further outside the initial distribution.
So it depends on if you’re comparing individual steps or just the starting/ending distributions.
Go enough shoulders down, and someone had to have been the first giant.
Probably not homo sapiens.. other hominids older than us developed a lot of technology
A discovery by a giant is in some sense a new base vector in the space of discoveries. The interesting question is if a statistical machine can only perform a linear combination in the space of discoveries, or if a statistical machine can discover a new base vector in the space of discoveries.. whatever that is.
For sure we know modern LLMs and AIs are not constrained by anything particularly close to simple linear combinations, by virtue of their depth and non-linear activation functions.
But yes, it is not yet clear to what degree there can be (non-linear) extrapolation in the learned semantic spaces here.
Pythagoras is the turtle.
Pythagoras learned from Egyptians that have been largely erased by euro/western narratives of superiority.
Arguably it's precisely a paradigm shift. Continuing whatever worked until now is within the paradigm, our current theories and tools works, we find few problems that don't fit but that's fine the rest is still progress, we keep on hitting more problems or those few pesky unsolved problems actually appear to be important. We then go back to the theory and its foundations and finally challenge them. We break from the old paradigm and come up with new theories and tools because the first principles are now better understood and we iterate.
So that's actually 2 different regimes on how to proceed. Both are useful but arguably breaking off of the current paradigm is much harder and thus rare.
The tricky part is that LLMs aren't just spewing outputs from the distribution (or "near" learned manifolds), but also extrapolating / interpolating (depending on how much you care about the semantics of these terms https://arxiv.org/abs/2110.09485).
There are genuine creative insights that come from connecting two known semantic spaces in a way that wasn't obvious before (e.g, novel isomorphism). It is very conceivable that LLMs could make this kind of connection, but we haven't really seen a dramatic form of this yet. This kind of connection can lead to deep, non-trivial insights, but whether or not it is "out-of-distribution" is harder to answer in this case.
I mean, there’s just no way you can take the set of publicly known ideas from all human civilizations, say, 5,000 years ago, and say that all the ideas we have now were “in the distribution” then. New ideas actually have to be created.
Yes
Seriously, think about it for a second...
If that were true then science should have accelerated a lot faster. Science would have happened differently and researchers would have optimized to trying to ingest as many papers as they can.
Dig deep into things and you'll find that there are often leaps of faith that need to be made. Guesses, hunches, and outright conjectures. Remember, there are paradigm shifts that happen. There are plenty of things in physics (including classical) that cannot be determined from observation alone. Or more accurately, cannot be differentiated from alternative hypotheses through observation alone.
I think the problem is when teaching science we generally teach it very linearly. As if things easily follow. But in reality there is generally constant iterative improvements but they more look like a plateau, then there are these leaps. They happen for a variety of reasons but no paradigm shift would be contentious if it was obvious and clearly in distribution. It would always be met with the same response that typical iterative improvements are met with "well that's obvious, is this even novel enough to be published? Everybody already knew this" (hell, look at the response to the top comment and my reply... that's classic "Reviewer #2" behavior). If it was always in distribution progress would be nearly frictionless. Again, with history in how we teach science we make an error in teaching things like Galileo, as if The Church was the only opposition. There were many scientists that objected, and on reasonable grounds. It is also a problem we continually make in how we view the world. If you're sticking with "it works" you'll end up with a geocentric model rather than a heliocentric model. It is true that the geocentric model had limits but so did the original heliocentric model and that's the reason it took time to be adopted.
By viewing things at too high of a level we often fool ourselves. While I'm criticizing how we teach I'll also admit it is a tough thing to balance. It is difficult to get nuanced and in teaching we must be time effective and cover a lot of material. But I think it is important to teach the history of science so that people better understand how it actually evolves and how discoveries were actually made. Without that it is hard to learn how to actually do those things yourself, and this is a frequent problem faced by many who enter PhD programs (and beyond).
And it still is. You can still lean on others while presenting things that are highly novel. These are not in disagreement.
It's probably worth reading The Unreasonable Effectiveness of Mathematics in the Natural Sciences. It might seem obvious now but read carefully. If you truly think it is obvious that you can sit in a room armed with only pen and paper and make accurate predictions about the world, you have fooled yourself. You have not questioned why this is true. You have not questioned when this actually became true. You have not questioned how this could be true.
https://www.hep.upenn.edu/~johnda/Papers/wignerUnreasonableE...