← Back to context

Comment by jatora

19 days ago

1. No, you dont get to fall back on the technical claim approach. Your bias in your phrasing was clear. Maybe that works for you but I won't just ignore obvious subtext and let you weasel out of this. And that's for the benefit of other readers, not you.

2. A plateau in coding performance? I don't think you even use these models for coding then if you make that claim. It is very clear models have continually improved. You can trust benchmarks to make that clear, or real world use, or better yet: both. You seem to not have the data from either.

3. No rigorous methods of filtering and curation that can separate AI slop from useful human output? Here you go:

a. Curation already works at scale. Modern training pipelines don’t rely on “AI vs human” detection. They filter by utility signals: correctness, novelty, coherence, task success, citation integrity, and cross-source consistency. These measurable properties do correlate with downstream model performance. Models trained on smaller, higher-quality corpora consistently outperform those trained on larger, noisier ones.

b. Human-generated “valuable” data is not shrinking. The claim assumes a fixed pool. In reality, high-value human data is expanding in areas that matter most: expert-labeled datasets, preference comparisons, multimodal demonstrations, tool-use traces, verified code with tests, and domain-expert feedback. These are explicitly created for training and are not polluted by passive AI spam.

c. Synthetic data is not a dead end—when constrained. Empirically, filtered and goal-conditioned synthetic data (self-play, distillation, adversarial generation) improves reasoning, math, coding, and tool use. The failure mode is unfiltered synthetic recursion—not synthetic data per se. This distinction is already operationalized in production systems.

d. Training value ≠ raw text volume. Scaling laws shifted: performance now tracks effective compute × data quality, not sheer token count. A smaller dataset with higher signal density produces better generalization than a massive, contaminated corpus. This is observed repeatedly in ablation studies.

----

Again, the above is not for you, as I believe you don't see beyond your cope (yet). It's for other readers who are intellectually curious.