Comment by minimaxir

2 months ago

An important note not mentioned in this announcement is that Claude 4's training cutoff date is March 2025, which is the latest of any recent model. (Gemini 2.5 has a cutoff of January 2025)

https://docs.anthropic.com/en/docs/about-claude/models/overv...

133 comments

minimaxir

lxgr 2 months ago

With web search being available in all major user-facing LLM products now (and I believe in some APIs as well, sometimes unintentionally), I feel like the exact month of cutoff is becoming less and less relevant, at least in my personal experience.

The models I'm regularly using are usually smart enough to figure out that they should be pulling in new information for a given topic.

bredren 2 months ago
It still matters for software packages. Particularly python packages that have to do with programming with AI!
They are evolving quickly, with deprecation and updated documentation. Having to correct for this in system prompts is a pain.
It would be great if the models were updating portions of their content more recently than others.
For the tailwind example in parent-sibling comment, should absolutely be as up to date as possible, whereas the history of the US civil war can probably be updated less frequently.
- rafram 2 months ago
  
  > the history of the US civil war can probably be updated less frequently.
  It's already missed out on two issues of Civil War History: https://muse.jhu.edu/journal/42
  Contrary to the prevailing belief in tech circles, there's a lot in history/social science that we don't know and are still figuring out. It's not IEEE Transactions on Pattern Analysis and Machine Intelligence (four issues since March), but it's not nothing.
  
  18 replies →
- pjmlp 2 months ago
  
  Given that I am still coding against Java 17, C# 7, C++17 and such at most work projects, and more recent versions are still the exception, it is quite reasonable.
  Few are on jobs where v-latest is always an option.
  
  2 replies →
- brylie 2 months ago
  
  I've had good success with the Context7 model context protocol tool, which allows code agents, like GitHub Copilot, to look up the latest relevant version of library documentation including code snippets: https://context7.com/
  
  3 replies →
- roflyear 2 months ago
  
  It matters even with recent cutoffs, these models have no idea when to use a package or not (if it's no longer maintained, etc)
  You can fix this by first figuring out what packages to use or providing your package list, tho.
  
  1 reply →
- andrepd 2 months ago
  
  The fact that things from March are already deprecated is insane.
  
  1 reply →
- jordanbeiber 2 months ago
  
  Cursor have a nice ”docs” feature for this, that have saved me from battles with constant version reversing actions from our dear LLM overlords.
- MollyRealized 2 months ago
  
  > whereas the history of the US civil war can probably be updated less frequently.
  Depends on which one you're talking about.
- alasano 2 months ago
  
  The context7 MCP helps with this but I agree.
- paulddraper 2 months ago
  
  How often are base level libraries/frameworks changing in incomparable ways?
  
  6 replies →
- toomuchtodo 2 months ago
  
  Does repo/package specific MCP solve for this at all?
  
  3 replies →
jacob019 2 months ago

Valid. I suppose the most annoying thing related to the cutoffs, is the model's knowledge of library APIs, especially when there are breaking changes. Even when they have some knowledge of the most recent version, they tend to default to whatever they have seen the most in training, which is typically older code. I suspect the frontier labs have all been working to mitigate this. I'm just super stoked, been waiting for this one to drop.
drogus 2 months ago

In my experience it really depends on the situation. For stable APIs that have been around for years, sure, it doesn't really matter that much. But if you try to use a library that had significant changes after the cutoff, the models tend to do things the old way, even if you provide a link to examples with new code.
myfonj 2 months ago

For the recent resources it might matter: unless the training data are curated meticulously, they may be "spoiled" by the output of other LLM, or even the previous version of the one that is being trained. That's something what is generally considered dangerous, because it could potentially produce unintentional echo-chamber or even somewhat "incestuously degenerated" new model.
jgalt212 2 months ago

> The models I'm regularly using are usually smart enough to figure out that they should be pulling in new information for a given topic.
Fair enough, but information encoded in the model is return in milliseconds, information that needs to be scraped is returned in 10s of seconds.
GardenLetter27 2 months ago
I've had issues with Godot and Rustls - where it gives code for some ancient version of the API.
- Aeolun 2 months ago
  
  > some ancient version of the API
  One and a half years old shudders
  
  1 reply →
iLoveOncall 2 months ago
Web search isn't desirable or even an option in a lot of use cases that involve GenAI.
It seems people have turned GenAI into coding assistants only and forget that they can actually be used for other projects too.
- lanstin 2 months ago
  
  That's because between the two approaches "explain me this thing" or "write code to demonstrate this thing" the LLMs are much more useful on the second path. I can ask it to calculate some third derivatives, or I can ask it to write Mathematica notebook to calculate the same derivatives, and the latter is generally correct and extremely useful as is - the former requires me to scrutinize each line of logic and calculation very carefully.
  It's like https://www.youtube.com/watch?v=zZr54G7ec7A where Prof. Tao uses claude to generate Lean4 proofs (which are then verifiable by machine). Great progress, very useful. While the LLM only approachs are still lacking utility for the top minds: https://mathstodon.xyz/@tao/113132502735585408
  
  2 replies →
guywithahat 2 months ago
I was thinking that too, grok can comment on things that have only just broke out hours earlier, cutoff dates don't seem to matter much
- DonHopkins 2 months ago
  
  Yeah, it seems pretty up-to-date with Elon's latest White Genocide and Holocaust Denial conspiracy theories, but it's so heavy handed about bringing them up out of the blue and pushing them in the middle of discussions about the Zod 4 and Svelte 5 and Tailwind 4 that I think those topics are coming from its prompts, not its training.
  
  2 replies →
Kostic 2 months ago

It's relevant from an engineering perspective. They have a way to develop a new model in months now.
dzhiurgis 2 months ago

Ditto. Twitter's Grok is especially good at this.
lobochrome 2 months ago

It knows uv now
tzury 2 months ago

web search is an immediate limited operation training is a petabytes long term operation
BeetleB 2 months ago

Web search is costlier.

tristanb 2 months ago

Nice - it might know about Svelte 5 finally...

brulard 2 months ago
It knows about Svelte 5 for some time, but it particularly likes to mix it with Svelte 4 in very weird and broken ways.
- rxtexit 2 months ago
  
  I have experienced this for various libraries. I think it helps to paste in a package.json in the prompt.
  All the models seem to struggle with React three fiber like this. Mixing and matching versions that don't make sense. I can see this being a tough problem given the nature of these models and the training data.
  I am going to also try to start giving it a better skeleton to start with and stick to the particular imports when faced with this issue.
  My very first prompt with claude 4 was for R3F and it imported a depreciated component as usual.
  We can't expect the model to read our minds.
- DonHopkins 2 months ago
  
  Or worse yet, React!

liorn 2 months ago

I asked it about Tailwind CSS (since I had problems with Claude not aware of Tailwind 4):

> Which version of tailwind css do you know?

> I have knowledge of Tailwind CSS up to version 3.4, which was the latest stable version as of my knowledge cutoff in January 2025.

threeducks 2 months ago
> Which version of tailwind css do you know?
LLMs can not reliably tell whether they know or don't know something. If they did, we would not have to deal with hallucinations.
- redman25 2 months ago
  
  They can if they've been post trained on what they know and don't know. The LLM can first been given questions to test its knowledge and if the model returns a wrong answer, it can be given a new training example with an "I don't know" response.
  
  1 reply →
- nicce 2 months ago
  
  We should use the correct term: to not have to deal with bullshit.
  
  3 replies →
SparkyMcUnicorn 2 months ago
Interesting. It's claiming different knowledge cutoff dates depending on the question asked.
"Who is president?" gives a "April 2024" date.
- ethbr1 2 months ago
  
  Question for HN: how are content timestamps encoded during training?
  
  12 replies →
dawnerd 2 months ago

I did the same recently with copilot and it of course lied and said it knew about v4. Hard to trust any of them.
PeterStuer 2 months ago

Did you try giving it the relevant parts of the tailwind 4 documentation in the prompt context?

Phelinofist 2 months ago

Why can't it be trained "continuously"?

cma 2 months ago
Catastrophic forgetting
https://en.wikipedia.org/wiki/Catastrophic_interference
- DonHopkins 2 months ago
  
  Fascinating, thank for that link! I was reading the sub-sections of the Proposed Solutions / Rehearsal section, thinking it seemed a lot like dreaming, then got to the Spontaneous replay sub-section:
  >Spontaneous replay
  >The insights into the mechanisms of memory consolidation during the sleep processes in human and animal brain led to other biologically inspired approaches. While declarative memories are in the classical picture consolidated by hippocampo-neocortical dialog during NREM phase of sleep (see above), some types of procedural memories were suggested not to rely on the hippocampus and involve REM phase of the sleep (e.g.,[22] but see[23] for the complexity of the topic). This inspired models where internal representations (memories) created by previous learning are spontaneously replayed during sleep-like periods in the network itself[24][25] (i.e. without help of secondary network performed by generative replay approaches mentioned above).
  The Electric Prunes - I Had Too Much To Dream (Last Night):
  https://www.youtube.com/watch?v=amQtlkdQSfQ
AlexCoventry 2 months ago

It's really not necessary, with retrieval-augmented generation. It can be trained to just check what the latest version is.

m3kw9 2 months ago

Even that, we don’t know what got updated and what didn’t. Can we assume everything that can be updated is updated?

diggan 2 months ago
> Can we assume everything that can be updated is updated?
What does that even mean? Of course an LLM doesn't know everything, so it we wouldn't be able to assume everything got updated either. At best, if they shared the datasets they used (which they won't, because most likely it was acquired illegally), you could make some guesses what they tried to update.
- therein 2 months ago
  
  > What does that even mean?
  I think it is clear what he meant and it is a legitimate question.
  If you took a 6 year old and told him about the things that happened in the last year and sent him off to work, did he integrate the last year's knowledge? Did he even believe it or find it true? If that information was conflicting what he knew before, how do we know that the most recent thing he is told he will take as the new information? Will he continue parroting what he knew before this last upload? These are legitimate questions we have about our black box of statistics.
  
  2 replies →
simlevesque 2 months ago
You might be able to ask it what it knows.
- minimaxir 2 months ago
  
  So something's odd there. I asked it "Who won Super Bowl LIX and what was the winning score?" which was in February and the model replied "I don't have information about Super Bowl LIX (59) because it hasn't been played yet. Super Bowl LIX is scheduled to take place in February 2025.".
  
  6 replies →
- krferriter 2 months ago
  
  Why would you trust it to accurately say what it knows? It's all statistical processes. There's no "but actually for this question give me only a correct answer" toggle.
- retrofuturism 2 months ago
  
  When I try Claude Sonnet 4 via web:
  https://claude.ai/share/59818e6c-804b-4597-826a-c0ca2eccdc46
  >This is a topic that would have developed after my knowledge cutoff of January 2025, so I should search for information [...]

indigodaddy 2 months ago

Should we not necessarily assume that it would have some FastHTML training with that March 2025 cutoff date? I'd hope so but I guess it's more likely that it still hasn't trained on FastHTML?

jph00 2 months ago

Claude 4 actually knows FastHTML pretty well! :D It managed to one-shot most basic tasks I sent its way, although it makes a lot of standard minor n00b mistakes that make its code a bit longer and more complex than needed.
I've nearly finished writing a short guide which, when added to a prompt, gives quite idiomatic FastHTML code.

VectorLock 2 months ago

I'm starting to wonder if having more recent cut-off dates is more a bug than a feature.

dvfjsdhgfv 2 months ago

One thing I'm 100% is that a cut off date doesn't exist for any large model, or rather there is no single date since it's practically almost impossible to achieve that.

sib 2 months ago
But I think the general meaning of a cutoff date, D, is:
The model includes nothing AFTER date D
and not
The model includes everything ON OR BEFORE date D
Right? Definitionally, the model can't include anything that happened after training stopped.
- dvfjsdhgfv 2 months ago
  
  That's correct. However, it is almost meaningless in practice as it might as well mean that, say, 99,99% of the content is 2 years old and older, and only 0,01 was trained just before that date. So if you need functionality that's dependent on new information, you have to test it for each particular component you need.
  Unfortunately I work with new APIs all the time and the cutoff date is of no much use.
koolba 2 months ago
Indeed. It’s not possible stop the world and snapshot the entire internet in a single day.
Or is it?
- tough 2 months ago
  
  you would have an append only incremental backup snapshot of the world
- gf000 2 months ago
  
  You can trivially maximal bound it, though. If the training finished today, then today is a cutoff date.
- dragonwriter 2 months ago
  
  That's... not what a cutoff date means. Cutoff date is an upper bound, not a promise that the model is trained on every piece of information set in a fixed form before that date.
tonyhart7 2 months ago

its not a definitive "date" you cut off information, but more a "recent" material you can feed, training takes times
if you waiting for a new information, of course you are not going ever to train

cma 2 months ago

When I asked the model it told me January (for sonnet 4). Doesn't it normally get that in its system prompt?

SparkyMcUnicorn 2 months ago

Although I believe it, I wish there was some observability into what data is included here.

Both Sonnet and Opus 4 say Joe Biden is president and claim their knowledge cutoff is "April 2024".

Tossrock 2 months ago
Are you sure you're using 4? Mine says January 2025: https://claude.ai/share/9d544e4c-253e-4d61-bdad-b5dd1c2f1a63
- SparkyMcUnicorn 2 months ago
  
  100% sure. Tested in the Anthropic workbench[0] to double check and got the same result.
  The web interface has a prompt that defines a cutoff date and who's president[1].
  [0] https://console.anthropic.com/workbench
  [1] https://docs.anthropic.com/en/release-notes/system-prompts#c...
  
  4 replies →