← Back to context

Comment by kingstnap

14 hours ago

Related: https://arxiv.org/abs/2601.03220

This is a paper that recently got popular ish and discusses the counter to your viewpoint.

> Paradox 1: Information cannot be increased by deterministic processes. For both Shannon entropy and Kolmogorov complexity, deterministic transformations cannot meaningfully increase the information content of an object. And yet, we use pseudorandom number generators to produce randomness, synthetic data improves model capabilities, mathematicians can derive new knowledge by reasoning from axioms without external information, dynamical systems produce emergent phenomena, and self-play loops like AlphaZero learn sophisticated strategies from games

In theory yes, something like the rules of chess should be enough for these mythical perfect reasoners that show up in math riddles to deduce everything that *can* be known about the game. And similarly a math textbook is no more interesting than a book with the words true and false and a bunch of true => true statements in it.

But I don't think this is the case in practice. There is something about rolling things out and leveraging the results you see that seems to have useful information in it even if the roll out is fully characterizable.

Interesting paper, thanks! But, the authors escape the three paradoxes they present by introducing training limits (compute, factorization, distribution). Kind of a different problem here.

What I object to are the "scaling maximalists" who believe that if enough training data were available, that complicated concepts like a world model will just spontaneously emerge during training. To then pile on synthetic data from a general-purpose generative model as a solution to the lack of training data becomes even more untenable.