← Back to context

Comment by minimaxir

2 months ago

An important note not mentioned in this announcement is that Claude 4's training cutoff date is March 2025, which is the latest of any recent model. (Gemini 2.5 has a cutoff of January 2025)

https://docs.anthropic.com/en/docs/about-claude/models/overv...

With web search being available in all major user-facing LLM products now (and I believe in some APIs as well, sometimes unintentionally), I feel like the exact month of cutoff is becoming less and less relevant, at least in my personal experience.

The models I'm regularly using are usually smart enough to figure out that they should be pulling in new information for a given topic.

  • It still matters for software packages. Particularly python packages that have to do with programming with AI!

    They are evolving quickly, with deprecation and updated documentation. Having to correct for this in system prompts is a pain.

    It would be great if the models were updating portions of their content more recently than others.

    For the tailwind example in parent-sibling comment, should absolutely be as up to date as possible, whereas the history of the US civil war can probably be updated less frequently.

    • > the history of the US civil war can probably be updated less frequently.

      It's already missed out on two issues of Civil War History: https://muse.jhu.edu/journal/42

      Contrary to the prevailing belief in tech circles, there's a lot in history/social science that we don't know and are still figuring out. It's not IEEE Transactions on Pattern Analysis and Machine Intelligence (four issues since March), but it's not nothing.

      18 replies →

    • Given that I am still coding against Java 17, C# 7, C++17 and such at most work projects, and more recent versions are still the exception, it is quite reasonable.

      Few are on jobs where v-latest is always an option.

      2 replies →

    • It matters even with recent cutoffs, these models have no idea when to use a package or not (if it's no longer maintained, etc)

      You can fix this by first figuring out what packages to use or providing your package list, tho.

      1 reply →

    • Cursor have a nice ”docs” feature for this, that have saved me from battles with constant version reversing actions from our dear LLM overlords.

    • > whereas the history of the US civil war can probably be updated less frequently.

      Depends on which one you're talking about.

  • Valid. I suppose the most annoying thing related to the cutoffs, is the model's knowledge of library APIs, especially when there are breaking changes. Even when they have some knowledge of the most recent version, they tend to default to whatever they have seen the most in training, which is typically older code. I suspect the frontier labs have all been working to mitigate this. I'm just super stoked, been waiting for this one to drop.

  • In my experience it really depends on the situation. For stable APIs that have been around for years, sure, it doesn't really matter that much. But if you try to use a library that had significant changes after the cutoff, the models tend to do things the old way, even if you provide a link to examples with new code.

  • For the recent resources it might matter: unless the training data are curated meticulously, they may be "spoiled" by the output of other LLM, or even the previous version of the one that is being trained. That's something what is generally considered dangerous, because it could potentially produce unintentional echo-chamber or even somewhat "incestuously degenerated" new model.

  • > The models I'm regularly using are usually smart enough to figure out that they should be pulling in new information for a given topic.

    Fair enough, but information encoded in the model is return in milliseconds, information that needs to be scraped is returned in 10s of seconds.

  • Web search isn't desirable or even an option in a lot of use cases that involve GenAI.

    It seems people have turned GenAI into coding assistants only and forget that they can actually be used for other projects too.

    • That's because between the two approaches "explain me this thing" or "write code to demonstrate this thing" the LLMs are much more useful on the second path. I can ask it to calculate some third derivatives, or I can ask it to write Mathematica notebook to calculate the same derivatives, and the latter is generally correct and extremely useful as is - the former requires me to scrutinize each line of logic and calculation very carefully.

      It's like https://www.youtube.com/watch?v=zZr54G7ec7A where Prof. Tao uses claude to generate Lean4 proofs (which are then verifiable by machine). Great progress, very useful. While the LLM only approachs are still lacking utility for the top minds: https://mathstodon.xyz/@tao/113132502735585408

      2 replies →

  • I was thinking that too, grok can comment on things that have only just broke out hours earlier, cutoff dates don't seem to matter much

    • Yeah, it seems pretty up-to-date with Elon's latest White Genocide and Holocaust Denial conspiracy theories, but it's so heavy handed about bringing them up out of the blue and pushing them in the middle of discussions about the Zod 4 and Svelte 5 and Tailwind 4 that I think those topics are coming from its prompts, not its training.

      2 replies →

  • It's relevant from an engineering perspective. They have a way to develop a new model in months now.

  • web search is an immediate limited operation training is a petabytes long term operation

Nice - it might know about Svelte 5 finally...

  • It knows about Svelte 5 for some time, but it particularly likes to mix it with Svelte 4 in very weird and broken ways.

    • I have experienced this for various libraries. I think it helps to paste in a package.json in the prompt.

      All the models seem to struggle with React three fiber like this. Mixing and matching versions that don't make sense. I can see this being a tough problem given the nature of these models and the training data.

      I am going to also try to start giving it a better skeleton to start with and stick to the particular imports when faced with this issue.

      My very first prompt with claude 4 was for R3F and it imported a depreciated component as usual.

      We can't expect the model to read our minds.

I asked it about Tailwind CSS (since I had problems with Claude not aware of Tailwind 4):

> Which version of tailwind css do you know?

> I have knowledge of Tailwind CSS up to version 3.4, which was the latest stable version as of my knowledge cutoff in January 2025.

  • > Which version of tailwind css do you know?

    LLMs can not reliably tell whether they know or don't know something. If they did, we would not have to deal with hallucinations.

    • They can if they've been post trained on what they know and don't know. The LLM can first been given questions to test its knowledge and if the model returns a wrong answer, it can be given a new training example with an "I don't know" response.

      1 reply →

  • I did the same recently with copilot and it of course lied and said it knew about v4. Hard to trust any of them.

Why can't it be trained "continuously"?

  • Catastrophic forgetting

    https://en.wikipedia.org/wiki/Catastrophic_interference

    • Fascinating, thank for that link! I was reading the sub-sections of the Proposed Solutions / Rehearsal section, thinking it seemed a lot like dreaming, then got to the Spontaneous replay sub-section:

      >Spontaneous replay

      >The insights into the mechanisms of memory consolidation during the sleep processes in human and animal brain led to other biologically inspired approaches. While declarative memories are in the classical picture consolidated by hippocampo-neocortical dialog during NREM phase of sleep (see above), some types of procedural memories were suggested not to rely on the hippocampus and involve REM phase of the sleep (e.g.,[22] but see[23] for the complexity of the topic). This inspired models where internal representations (memories) created by previous learning are spontaneously replayed during sleep-like periods in the network itself[24][25] (i.e. without help of secondary network performed by generative replay approaches mentioned above).

      The Electric Prunes - I Had Too Much To Dream (Last Night):

      https://www.youtube.com/watch?v=amQtlkdQSfQ

  • It's really not necessary, with retrieval-augmented generation. It can be trained to just check what the latest version is.

Even that, we don’t know what got updated and what didn’t. Can we assume everything that can be updated is updated?

  • > Can we assume everything that can be updated is updated?

    What does that even mean? Of course an LLM doesn't know everything, so it we wouldn't be able to assume everything got updated either. At best, if they shared the datasets they used (which they won't, because most likely it was acquired illegally), you could make some guesses what they tried to update.

    • > What does that even mean?

      I think it is clear what he meant and it is a legitimate question.

      If you took a 6 year old and told him about the things that happened in the last year and sent him off to work, did he integrate the last year's knowledge? Did he even believe it or find it true? If that information was conflicting what he knew before, how do we know that the most recent thing he is told he will take as the new information? Will he continue parroting what he knew before this last upload? These are legitimate questions we have about our black box of statistics.

      2 replies →

  • You might be able to ask it what it knows.

    • So something's odd there. I asked it "Who won Super Bowl LIX and what was the winning score?" which was in February and the model replied "I don't have information about Super Bowl LIX (59) because it hasn't been played yet. Super Bowl LIX is scheduled to take place in February 2025.".

      6 replies →

    • Why would you trust it to accurately say what it knows? It's all statistical processes. There's no "but actually for this question give me only a correct answer" toggle.

Should we not necessarily assume that it would have some FastHTML training with that March 2025 cutoff date? I'd hope so but I guess it's more likely that it still hasn't trained on FastHTML?

  • Claude 4 actually knows FastHTML pretty well! :D It managed to one-shot most basic tasks I sent its way, although it makes a lot of standard minor n00b mistakes that make its code a bit longer and more complex than needed.

    I've nearly finished writing a short guide which, when added to a prompt, gives quite idiomatic FastHTML code.

One thing I'm 100% is that a cut off date doesn't exist for any large model, or rather there is no single date since it's practically almost impossible to achieve that.

  • But I think the general meaning of a cutoff date, D, is:

    The model includes nothing AFTER date D

    and not

    The model includes everything ON OR BEFORE date D

    Right? Definitionally, the model can't include anything that happened after training stopped.

    • That's correct. However, it is almost meaningless in practice as it might as well mean that, say, 99,99% of the content is 2 years old and older, and only 0,01 was trained just before that date. So if you need functionality that's dependent on new information, you have to test it for each particular component you need.

      Unfortunately I work with new APIs all the time and the cutoff date is of no much use.

  • Indeed. It’s not possible stop the world and snapshot the entire internet in a single day.

    Or is it?

    • You can trivially maximal bound it, though. If the training finished today, then today is a cutoff date.

    • That's... not what a cutoff date means. Cutoff date is an upper bound, not a promise that the model is trained on every piece of information set in a fixed form before that date.

  • its not a definitive "date" you cut off information, but more a "recent" material you can feed, training takes times

    if you waiting for a new information, of course you are not going ever to train

When I asked the model it told me January (for sonnet 4). Doesn't it normally get that in its system prompt?

Although I believe it, I wish there was some observability into what data is included here.

Both Sonnet and Opus 4 say Joe Biden is president and claim their knowledge cutoff is "April 2024".