← Back to context

Comment by hnlmorg

20 days ago

I’ve worked in the AI space and I understand how LLMs work as a principle. But we don’t know the magic contained within a model after it’s been trained. We understand how to design a model, and how models work at a theoretical level. But we cannot know how well it will be at inference until we test it. So much of AI research is just trial and error with different dials repeated tweaked until we get something desirable. So no, we don’t understand these models in the same way we might understand how an hashing algorithm works. Or a compression routine. Or an encryption cypher. Or any other hand-programmed algorithm.

I also run Linux. But that doesn’t change how the two major platforms behave and that, as software developers, we have to support those platforms.

Open source hardware is great but it’s not on the same league of price and performance as proprietary hardware.

Agentic AI doesn’t make me feel hopeless either. I’m just describing what I’d personally define as a “golden age of computing”.

but isn't this like a lot of other CS-related "gradient descent"?

when someone invents a new scheduling algorithm or a new concurrent data structure, it's usually based on hunches and empirical results (benchmarks) too. nobody sits down and mathematically proves their new linux scheduler is optimal before shipping it. they test it against representative workloads and see if there is uplift.

we understand transformer architectures at the same theoretical level we understand most complex systems. we know the principles, we have solid intuitions about why certain things work, but the emergent behavior of any sufficiently complex system isn't fully predictable from first principles.

that's true of operating systems, distributed databases, and most software above a certain complexity threshold.

  • No. Algorithm analysis is much more sophisticated and well defined than that. Most algorithms are deterministic, and it is relatively straightforward to identify complexity, O(). Even nondeterministic algorithms we can evaluate asymptotic performance under different categories of input. We know a lot about how an algorithm will perform under a wide variety of input distributions regardless of determinism. In the case of schedulers, and other critical concurrency algorithms, performance is well known before release. There is a whole subfield of computer science dedicated to it. You don't have to "prove optimality" to know a lot about how an algorithm will perform. What's missing in neural networks is the why and how any inputs will propagate, through the network during inference. It is a black box of understandability. Under a great deal of study, but still very poorly understood.

    • i agree w/ the the complexity analysis point, but that theoretical understanding actually translates to real world deployment decisions in both subfields. knowing an algorithm is O() tells you surprisingly little about whether itll actually outperform alternatives on real hardware with real cache hierarchies, branch predictors, and memory access patterns. same thing with ML (just with the very different nature of GPU hw), both subfields hve massive graveyards of "improvements" that looked great on paper (or in controlled environments) but never made it into production systems. arxiv is full of architecture tweaks showing SOTA on some benchmark and the same w/ novels data structures/algorithms that nobody ever uses at scale.

      1 reply →