Comment by devmor
2 months ago
Why would this be surprising? That’s exactly how much of the code they were trained on is presented in PRs, Forums, etc.
2 months ago
Why would this be surprising? That’s exactly how much of the code they were trained on is presented in PRs, Forums, etc.
Is that true? That depends on how their web scraping works, like whether it runs client-side highlighting, strips out HTML tags, etc.
The highlighting isn't what matters, its the pretext. E.g. An LLM seeing "```python" before a code block is going to better recall python codeblocks by people that prefixed them that way.