Comment by culopatin

5 days ago

Do you think this will continue growing if we stop struggling and posting our findings on forums?

Yeah, I think that's a legitimate concern. It's hard to know, even with sufficient training data, how far these systems can actually generalize their problem-solving abilities when they become data starved in the future either because of scarcity or that any potential new training data is contaminated by LLM radiation.

Too bad we don’t have a portal gun to access an infinite number of parallel universes where large language models were never invented for sources of unlimited fresh training data and unlimited palpatine power.

  • I'm more optimistic about LLMs tracking down and fixing issues in software, even without SO/forum posts, at least for OSS. I've seen enough unique insights from agents on tricky problems to know it wasn't extrapolating from a helpful comment somewhere.

    It hit me that as it's deciphering some verbose log file, it has also read through all the source code that wrote that log, and likely all of the discussions/commits that went into building that (broken) feature.

I don't think so, because Anthropic now has your question, the steps it tried, and the solution that finally worked, all in text form, already on their servers thanks to your claude session. Claude usage is itself a goldmine of training data.

  • Ish. If I have it generate code for me that doesn't work and I don't tell it why it's garbage and don't share my cleaned up results on github after, it doesn't know how or why the code that was output was bad, or even that it was.