← Back to context

Comment by bilsbie

3 days ago

I just thought of a good test. Anyone have feedback?

We completely remove a couple simple, obvious inventions from the training data and then see if the AI can come up with it. Perhaps a toothbrush for example. Or a comb? But there could be better examples that would also have minimal effect on the final Ai.

Training is expensive so we wouldn’t want to leave anything important out like the wheel.

It’s very, very hard to remove things from the training data and be sure there is zero leakage.

Another idea would be to use, for example, a 2024 state of the art model to try to predict discoveries or events from 2025.

LLM companies try to optimize their benchmark results, not to test the capabilities of their systems. This is why all the benchmarks are so utterly useless.

Ok, you do it. Here’s the internet: https://internet Make sure you don’t miss any references while you’re combing through, though.

  • I see your point but off the top of my head: a simple regex on each document for a list of dental related words that then gets earmarked for a small LLM to determine if it includes a toothbrush concept.