← Back to context

Comment by jart

8 hours ago

I value software that reveals knowledge. The frontier LLMs were trained on all the code that institutions had been keeping to themselves. So they're revealing programing know-how on a scale that just wasn't possible with open source. LLMs are the ultimate Prometheus. Information is more accessible and useful now than it's ever been.

> The frontier LLMs were trained on all the code that institutions had been keeping to themselves.

Lolz! I haven’t encountered “code that institutions had been keeping to themselves” that got even remotely close to OSS in quality.

I promise you, "the code that institutions had been keeping to themselves" is not nearly as special or good as you are implying here.

  • True.

    I have worked during several decades in many companies, located in many countries, in a few continents, from startups to some of the biggest companies in their fields. Therefore I have seen many proprietary programs.

    On average, proprietary programs are not better than open-source programs, but usually worse, because they are reviewed by fewer people and because frequently the programmers who write them may be stressed by having to meet unrealistic timelines for the projects.

    The proprietary programs have greater quantity, not quality, by being written by a greater number of programmers working full-time on them, while much work on open-source projects is done in spare time by people occupied with something else.

    Many proprietary programs can do things which cannot be done by open-source programs, but only because of access to documentation that is kept secret in the hope of preventing competition.

    While lawyers, and other people who do not understand how research and development is really done, put a lot of weight in the so-called "intellectual property" of a company, which they believe to be embodied in things like the source code of proprietary programs or the design files for some hardware, the reality is that I have nowhere seen anything of substantial value in this so-called IP. Everywhere, what was really valuable in the know-how of the company was not the final implementation that could be read in some source code, but the knowledge about the many other solutions that had been tried before and they worked worse or not at all. This knowledge was too frequently not written down in any documentation. Knowing which are the dead ends is a great productivity boost for an experienced team, because any recent graduate could list many alternative ways of solving a problem, but most of them would not be the right choice in certain specific circumstances.

    • > On average, proprietary programs are not better than open-source programs, but usually worse, because they are reviewed by fewer people and because frequently the programmers who write them may be stressed by having to meet unrealistic timelines for the projects.

      There's also the fact that when you write open-source code, you're writing for a friendly audience. I've often found myself writing the code, letting it rest for a few hours, then rewriting it so that it is easier to read. Sometimes, the code gets substantially rewritten before I push.

      There's no cooling period when you write code during your 9-5 job: it works, it has the required test coverage, ship it and move on to the next task.

  • The claim is also just categorically untrue. The largest source of training data by far is publicly available code on e.g. Github, so it mostly just gives you a way to recycle already-available code, without crediting the author, while allowing you to pretend you own it.

    • So you're both saying all the alpha in Claude comes from open source devs like me? Even when I'm wrong I'm right.