Comment by eqmvii

3 days ago

Could this be an experiment to show how likely LLMs are to lead to AGI, or at least intelligence well beyond our current level?

If you could only give it texts and info and concepts up to Year X, well before Discovery Y, could we then see if it could prompt its way to that discovery?

> Could this be an experiment to show how likely LLMs are to lead to AGI, or at least intelligence well beyond our current level?

You'd have to be specific what you mean by AGI: all three letters mean a different thing to different people, and sometimes use the whole means something not present in the letters.

> If you could only give it texts and info and concepts up to Year X, well before Discovery Y, could we then see if it could prompt its way to that discovery?

To a limited degree.

Some developments can come from combining existing ideas and seeing what they imply.

Other things, like everything to do with relativity and quantum mechanics, would have required experiments. I don't think any of the relevant experiments had been done prior to this cut-off date, but I'm not absolutely sure of that.

You might be able to get such an LLM to develop all the maths and geometry for general relativity, and yet find the AI still tells you that the perihelion shift of Mercury is a sign of the planet Vulcan rather than of a curved spacetime: https://en.wikipedia.org/wiki/Vulcan_(hypothetical_planet)

  • > You'd have to be specific what you mean by AGI

    Well, they obviously can't. AGI is not science, it's religion. It has all the trappings of religion: prophets, sacred texts, origin myth, end-of-days myth and most importantly, a means to escape death. Science? Well, the only measure to "general intelligence" would be to compare to the only one which is the human one but we have absolutely no means by which to describe it. We do not know where to start. This is why you scrape the surface of any AGI definition you only find circular definitions.

    And no, the "brain is a computer" is not a scientific description, it's a metaphor.

    • > And no, the "brain is a computer" is not a scientific description, it's a metaphor.

      Disagree. A brain is turing complete, no? Isn't that the definition of a computer? Sure, it may be reductive to say "the brain is just a computer".

      8 replies →

    • > And no, the "brain is a computer" is not a scientific description, it's a metaphor.

      I have trouble comprehending this. What is "computer" to you?

    • Cargo cults are a religion, the things they worship they do not understand, but the planes and the cargo themselves are real.

      There's certainly plenty of cargo-culting right now on AI.

      Sacred texts, I don't recognise. Yudkowsky's writings? He suggests wearing clown shoes to avoid getting a cult of personality disconnected from the quality of the arguments, if anyone finds his works sacred, they've fundamentally misunderstood him:

        I have sometimes thought that all professional lectures on rationality should be delivered while wearing a clown suit, to prevent the audience from confusing seriousness with solemnity.
      

      - https://en.wikiquote.org/wiki/Eliezer_Yudkowsky

      Prophets forecasting the end-of-days, yes, but this too from climate science, from everyone who was preparing for a pandemic before covid and is still trying to prepare for the next one because the wet markets are still around, from economists trying to forecast growth or collapse and what will change any given prediction of the latter into the former, and from the military forces of the world saying which weapon systems they want to buy. It does not make a religion.

      A means to escape death, you can have. But it's on a continuum with life extension and anti-aging medicine, which itself is on a continuum with all other medical interventions. To quote myself:

        Taking a living human's heart out without killing them, and replacing it with one you got out a corpse, that isn't the magic of necromancy, neither is it a prayer or ritual to Sekhmet, it's just transplant surgery.
      
        …
      
        Immunity to smallpox isn't a prayer to the Hindu goddess Shitala (of many things but most directly linked with smallpox), and it isn't magic herbs or crystals, it's just vaccines.
      

      - https://benwheatley.github.io/blog/2025/06/22-13.21.36.html

It'd be difficult to prove that you hadn't leaked information to the model. The big gotcha of LLMs is that you train them on BIG corpuses of data, which means it's hard to say "X isn't in this corpus", or "this corpus only contains Y". You could TRY to assemble a set of training data that only contains text from before a certain date, but it'd be tricky as heck to be SURE about it.

Ways data might leak to the model that come to mind: misfiled/mislabled documents, footnotes, annotations, document metadata.

  • There's also severe selection effects: what documents have been preserved, printed, and scanned because they turned out to be on the right track towards relativity?

    • This.

      Especially for London there is a huge chunk of recorded parliament debates.

      More interesting for dialoge seems training on recorded correspondence in form of letters anyway.

      And that corpus script just looks odd to say the least, just oversample by X?

I think not if only for the fact that the quantity of old data isn't enough to train anywhere near a SoTA model, until we change some fundamentals of LLM architecture

  • Are you saying it wouldn't be able to converse using english of the time?

    • Machine learning today requires an obscene quantity of examples to learn anything.

      SOTA LLMs show quite a lot of skill, but they only do so after reading a significant fraction of all published writing (and perhaps images and videos, I'm not sure) across all languages, in a world whose population is 5 times higher than the link's cut off date, and the global literacy went from 20% to about 90% since then.

      Computers can only make up for this by being really really fast: what would take a human a million or so years to read, a server room can pump through a model's training stage in a matter of months.

      When the data isn't there, reading what it does have really quickly isn't enough.

    • That's not what they are saying. SOTA models include much more than just language, and the scale of training data is related to its "intelligence". Restricting the corpus in time => less training data => less intelligence => less ability to "discover" new concepts not in its training data

      2 replies →

  • I mean, humans didn't need to read billions of books back then to think of quantum mechanics.

    • Which is why I said it's not impossible, but current LLM architecture is just not good enough to achieve this.

I think this would be an awesome experiment. However you would effectively need to train something of a GPT-5.2 equivalent. So you need lot of text, a much larger parameterization (compared to nanoGPT and Phi-1.5), and the 1800s equivalents of supervised finetuning and reinforcement learning with human feedback.

This would be a true test of can LLMs innovate or just regurgitate. I think part of people's amazement of LLMs is they don't realize how much they don't know. So thinking and recalling look the same to the end user.

That is one of the reasons I want it done. We cant tell if AI's are parroting training data without having the whole, training data. Making it old means specific things won't be in it (or will be). We can do more meaningful experiments.

This is fascinating, but the experiment seems to fail in being a fair comparison of how much knowledge can we have from that time in data vs now.

As a thought experiment I find it thrilling.

OF COURSE!

The fact that tech leaders espouse the brilliance of LLMs and don't use this specific test method is infuriating to me. It is deeply unfortunate that there is little transparency or standardization of the datasets available for training/fine tuning.

Having this be advertised will make more interesting and informative benchmarks. OEM models that are always "breaking" the benchmarks are doing so with improved datasets as well as improved methods. Without holding the datasets fixed, progress on benchmarks are very suspect IMO.

I fail to see how the two concepts equate.

LLMs have neither intelligence nor problem-solving abillity (and I won't be relaxing the definition of either so that some AI bro can pretend a glorified chatbot is sentient)

You would, at best, be demonstrating that the sharing of knowledge across multiple disciplines and nations (which is a relatively new concept - at least at the scale of something like the internet) leads to novel ideas.

  • I've seen many futurists claim that human innovation is dead and all future discoveries will be the results of AI. If this is true, we should be able to see AI trained on the past figure it's way to various things we have today. If it can't do this, I'd like said futurists to quiet down, as they are discouraging an entire generation of kids who may go on to discover some great things.

    • > I've seen many futurists claim that human innovation is dead and all future discoveries will be the results of AI.

      I think there's a big difference between discoveries through AI-human synergy and discoveries through AI working in isolation.

      It probably will be true soon (if it isn't already) that most innovation features some degree of AI input, but still with a human to steer the AI in the right direction.

      I think an AI being able to discover something genuinely new all by itself, without any human steering, is a lot further off.

      If AIs start producing significant quantities of genuine and useful innovation with minimal human input, maybe the singularitarians are about to be proven right.

    • I'm struggling to get a handle on this idea. Is the idea that today's data will be the data of the past, in the future?

      So if it can work with whats now past, it will be able to work with the past in the future?

      1 reply →