← Back to context

Comment by qsera

6 hours ago

>I’m just curious, what would need to happen for you to change your opinion about this?

Imagine a machine that can calculate using logic circuits and one that uses a lookup table.

LLMs right now is the latter (please don't take literally, It is just an example). You can argue that the look up table is so huge that it works most of the time.

But I (and probably the parent commenter) need it to be the former. And that answers your question.

So it does not matter how huge the lookup table will grow in the future so that it will work more often, it is still a lookup table.

So people are divided into two groups right now. One group that goes by appearance, and one that goes by what the thing actually is fundamentally, despite the appearances.

I think you will get a better response to a slightly different analogy. In genetic programming (and in machine learning), we have a concept of "overfitting". Overfitting can be understood as a program memorizing too much of its test/training data (i.e. so it is acting more like an oracle than a computation). This, intuitively, becomes less of a problem the greater the training-dataset becomes, but the problem will always be there. Noticing the problem is like noticing the invisible wall at the edge of the game-world.

The most insightful thing about LLMs, is just how _useful_ overfitting can be in practice, when applied to the entire internet. In some sense, stack-overflow-driven-development (which was widespread throughout the industry since at least 2012), was an indication that much of a programmer's job was finding specific solutions to recurring problems, that never seem to get permanently fixed (mostly for reasons of culture, conformity, and churn in the ranks).

The more I see the LLM-ification of software unfold (essentially an attempted controlled demolition of our industry and our culture), the more I think about Arthur Whitney (inventor of the K language and others). In this interview[1], he said two interesting things: (1) he likened programming to poetry, and (2) he said that he designed his languages to not have libraries, and everybody builds from the 50 basic operators that come with the language, resulting in very short programs (in terms of both source code size and compiled/runtime code size).

I wonder if our tendency to depend on libraries of functions, counterintuitively results in more source code (and more compiled/runtime code) in the long run -- similarly to how using LLMs for coding tends to be very verbose as well. In principle, libraries are collections of composable domain-verbs that should allow a programmer to solve domain-problems, and yet, it rarely feels that way. I have ripped out general libraries, and replaced them with custom subroutines more times than I can count, because I usually need a subset of functionality, and I need it to be correct (many libraries are complex and buggy because they have some edge-cases [for example, I once used an AVL library that would sometimes walk the tree in reverse instead of from least to greatest -- unfortunately, the ordering mattered, and I wrote a simpler bespoke implementation]).

Arguably, a buggy program or a buggy library or a buggy function, is just an overfit program, or library, or function (it is overfit to the mental-model of the problem-space in the library writer's mind). These overfit libraries, which are often used as blackboxes by someone rushing to meet a deadline, often result in programs that are themselves overfit to the buggy library, creating _less_ modularity instead of more. _Creating_ an abstraction is practically free, but maintaining it and (most disappointingly) _using_ it has real, often permanent long term costs. I have rarely been able to get two computers, that were meant to share data with NFS, to do so reliably, if they were not running the same exact OS (because the NFS client and server of each OS are bug-compatible, are overfit to each other).

In fact the rise of VMWare, and the big cloud companies, and containerization and virtualization technologies is, conceivably, caused by this very tendency to write software that is overfit to other software (the operating system, the standard library [on some OSes emacs has to be forced to link to glibc, because using any other memory allocator causes it to SEGFAULT, and don't get me started on how no two browser-canvases return the same output in different browser _nor_ on the same browser in a different OS]). (Maybe, just as debt keeps the economy from collapsing, technical debt is the only thing that keeps Silicon Valley from collapsing.)

In some ways, coding-LLMs exaggerate this tendency towards overfitting in comical ways, like fun-house mirrors. And now, a single individual, with nothing but a dream, can create technical debt at the same rate as a thousand employee software company could a decade ago. What a time to be alive.

[1]: https://queue.acm.org/detail.cfm?id=1531242