Comment by sumitkumar

19 hours ago

The weights start with a random manifold. The training takes data and shapes the manifold, weight by weight, in many cycles. Once the training is the done manifold is fixed.

When a new inference has to be done the query(q) is projected in the manifold space. This projection is dropped on the manifold and the gravity of the manifold gives an answer of q+1 length. Which(qw+i) is dropped qw+n times to output a final response of n length.

The gravity is created by repeated multiplication(of the weights/input) to find out how the projected embeddings should fall according to the manifold in the GPU.

It's like a giant plinko board where the shape of the original disc guides how it falls through the apparatus, and the apparatus has been tuned so that different discs end up in the exits we want them to

That's a very concise and illuminating way to think about what's happening, IF (and only if) you already know how these models work. Thanks for that.

  • Yes this is more like compression to remember and not for learning/understanding.

    • Compression is the reason why these Models are able to learn and understand.

      My brain is doing the exact same thing.

      I learned enough to compress concepts like a bike and what a bike does and for what i can use a bike.

      Ask a LLM and it will answer you similiar to humans.

      Blind people learn concepts of bikes too and in a smiliar way: by description.

      LLMs just have so much data in form of text available and are able to ingest all of this, that the LLM compression algorithm doesn't has to be that good/finetuned than ours.

      But I would assume that Yann LeCun's JEPA or other breakthroughs in the next few years will get us there.

      11 replies →

In what way is that different from any other model of reality that you'd use to winnow a dataset into an answer to a question? The only major difference I see is that beyond a certain number of transformations, people are willing to treat it as some sort of miracle, and too tired to figure out why it came up with the answer it came up with. It's almost like people desperately want to give up their agency and creativity to black boxes, whether those weights produce answers that are right or wrong. Factor in that psychology and it looks a lot less like we have invented something useful, and a lot more like we as a species are choosing to quit life en masse.

  • > The only major difference I see is that beyond a certain number of transformations, people are willing to treat it as some sort of miracle, and too tired to figure out why it came up with the answer it came up with.

    It’s funny, because I thought you were talking about humans here when you wrote this. We know some things about how our bodies encode information that is sent to the brain, and we know some things about how neurons receive information and act on it, but after that we get too tired and give up on how the brain works and treat it like a miracle.

    It’s like we desperately want to believe our consciousness is not just electrical impulses in our brain, and we want to ascribe agency and uniqueness to the physical processes going on in our head.

    • There's definitely a sizable contingent of people who desperately want to believe consciousness is just electrical impulses in our brain. Because "what else could it be"? The fact is that we just don't know, and "abiding in the not-knowing" is for many the most uncomfortable thing ever. Especially for the curious- and rational-minded people this forum tends to attract. I'm one of them, too.

      5 replies →

    • > but after that we get too tired and give up on how the brain works and treat it like a miracle.

      I disagree. We know very well how neurons work, and we have a pretty good idea of how neural activity translates to behavior. In other words, we have a pretty good idea on how the brain works. We stop at consciousness because as of yet it is in the realm of philosophy, not science. We don‘t know what consciousness is or even whether or not it is useful for science and we are simply waiting for the philosophers guides us out of that situation.

      Note that both cognitive psychology and behavioral psychology has done fine without tackling consciousness. When neuropsychology emerged in the 1980s it complemented both these fields perfectly. The situation is the opposite with the philosophy of mind which grew significantly around the same time.

      There have been some attempts to describe consciousness as an emerging phenomena out of neural activity, but so far all of these attempts have failed, or at least failed to turn consciousness into a useful term in psychology (the way gravity is a useful term in physics). I think it is equally likely that these attempts have failed because consciousness may simply not be a useful term in psychology, that is as likely as it is that we simply don‘t understand it well enough.

      13 replies →

  • > beyond a certain number of transformations, people are willing to treat it as some sort of miracle, and too tired to figure out why it came up with the answer it came up with

    It’s less about being too tired and more about being realistic about the limits of understanding.

    Consider mass and energy flows in planet-scale systems: At some point we call these “weather” and change the tools with which we study them, but we never stopped trying to understand the phenomenon.

    • When we attempt to recreate those complex, planetary atmospheric phenomena in a box, we're doing so in order to measure and study them.

      Making random turbulence in a box until it resembles the outside world, and calling it weather and extrapolating some predictive meaning from the result, is the total antithesis of what you're describing about why we come up with simplified models for impossibly complex systems. The purpose of [mathematical] models that are built thoughtfully is to explain why complex systems are the way they are, with data and algorithms, however imperfectly. [Whereas] The purpose of LLM models is to give the illusion of answering questions while never answering why the answer was given. The difference is the difference between a scientist and a tarot card reader, an equation and an oracle.

      People have a well known tendency to gravitate toward the shamanistic, oracular, and superstitous. Listen, I ran a casino for 6 years, I know. The impossibility of knowing how 80 layers of matrix multiplication led to a particular answer is in itself a psychological factor in choosing whether to accept the answer or to question it. People tend to err on the side of the over, in sports betting terms... or on the lazy side in general... and they will make whatever excuses they need to after the fact to justify their decisions. So now we have a machine that can act like an oracle and which you can also blame, but the blame goes into a void because this machine is stateless and is only a reflection of information, not an intentional refinery of data.

      Sit next to a bank of slot machines for an hour and listen to the absolutely ridiculous shit most people will come up with to explain how they "know" if a machine is going to pay out soon, and then tell me if you think it's a good idea to give them an LLM in their pocket to answer their questions in whatever way they frame them.

      4 replies →

    • If you're going to make something smarter than a person, you got to be convinced that you're only going to be able to understand it on the single training step level and then inductively trust that the rest of it works. We do empirical testing of course with evals, but there's sort of an art to figuring out what is theoretically going to improve eval performance. Trying to fit the meaning of all those weights in your little human brain and working back from there isn't going to work for more than a little slice of the dataset at a time because that's all we can fit in our understanding.

  • Agency?

    What are you talking about?

    I want freedom.

    I want freedom to do what i want and not sitting in front of a computer and coding for some company.

    Please AI lets burn down knowledge work and labor work. Lets create so much stress to our society that we start rethinking what works mean.

    Lets redefine work into discovering the world again. Let people do old handcraft jobs, let them do more sports, let them read more, let them write and make more. Let them enjoy nature.

    • Work has never been about "discovering the world". There have been a handful of privileged folks who had the time to "discover the world". Work has traditionally been "let's find enough food for my family". If you want to think of a future of abundance then perhaps we can discover the world.

      2 replies →

    • > Lets redefine work into discovering the world again. Let people do old handcraft jobs, let them do more sports, let them read more, let them write and make more. Let them enjoy nature.

      Why leave something so important up to what AI does or doesn't do?

      2 replies →

    • "I want freedom to do what i want and not sitting in front of a computer and coding for some company."

      "Please AI lets burn down knowledge work and labor work"

      "Let people do old handcraft jobs."

      So many presuppositions about what people want to do.

      As a child I spent a lot of time programming and doing "knowledge work" because it's fun - I don't enjoy "old hand-crafted jobs". Sure, let's definitely destroy capitalism in it's current state I suppose. But I find people like you who hate knowledge-work/coding and think everyone else must feel the same and only do it for the money a bit out-of-touch.

      3 replies →

    • This seems to be a little naive about how humans consume the benefits we create in society.

      "Let people do old handcraft jobs, let them do more sports, let them read more, let them write and make more. Let them enjoy nature."

      Very nice thoughts. You know we all could do this today without "burning it down"? Get in your pod, eat your slop, and watch your screen is where this is headed.

      "I want freedom to do what i want and not sitting in front of a computer and coding for some company."

      You get that it's you creating the misery here? Then stop? Don't do it. Go start a farm or whatever you think will solve your problems. At some point this all boils down to "chop wood and fetch water" so if the modern way of doing that is so terrible then stop. Go fetch water the old fashioned way and be free.

      2 replies →

    • The solution we've come up with is move all the unpleasant work stuff to China where people don't complain about doing it because they already have communism, and therefore everything is of course effortlessly perfect there.

  • >It's almost like people desperately want to give up their agency and creativity

    Don't make me think!

    Also don't make me take responsibility. (This seems to be the actual function of every organization.)

The weights are code, the prompt is code, the output is code.

Is the meat code?

  • The data is the code. Training algorithm is the compiler. The weights are the byte code produced to run on the inference VM.

    • The data is the code is the data. Reality has no distinction between "data" and "code". These terms are categories we impose on systems we design, to make it easier for us to build and reason about them, but they're nothing but mere opinions, and depend less on the system structure, and more on the perspective of the person asking which is which.

      This is related to, and possibly equivalent with, the core point of both this story and the original one: computation is independent from substrate.

      You can build a computer out of anything, whether it's semiconductors or lasers or meat or magnetic fields or water flowing downhill or abstract thought, and that computer will happily perform the same computation as every other equivalent construct from whatever substrate. That's because computers are ultimately made of math, and we design "real ones" by finding ways to approximate the mathematical constraints with physical systems. But the choice of how to map the math to physical systems is completely arbitrary, and any such mappings are equivalent from POV of information processing ability.

      (Of course substrate is not arbitrary from economic POV, which is why we build most of our computers out of silicon and plastic, and make it work with electric current and lasers.)

      2 replies →

  • Yes. Is it data? Yes.

    Is the distinction between "code" and "data" just someone's opinion? Yes. There is no such distinction in reality.

    • That's why encountering something like LISP for the first time (by writing a LISP interpreter, for example) creates a big bang event in form of an imminent intellectual catharsis. People who encountered it just once, will never be able to see the world through the old "meaty" lenses afterwards.

    • This is a good model. If you take an old ROM dump from a video game, it's just a pile of bits. You don't know what bits represent code, what represent an image, what represent text, etc. You have to analyze them contextually to actually figure out what is code and what is "data" in context, because without context they are truly one and the same.

  • Is matter code? There is some sort of computation happening in space over time.

    • By Fermat's principle, a ray of light has to know where it will ultimately end up before it can choose the direction to begin moving in.

      So either something is computing it or some exploration is happening at quantum level and we just see the final result.

      5 replies →

It would be interesting (if unlikely) that we found our reality worked that was as well. Mostly just super granular gravity nodes made up purely of clumpy probability distributions.

Also I describe this more simply to others as a tree walk where the manifold is mapped along a walk through it (even if it’s not linear one) where you choose the next jump based on the most likely future mode relative to the nodes already created and the ones from the prompt.

This helps some understand attention layers rudimentary. Even tho it leaves out that multiple layers sort of prune the overall manifold in successive passes.

Yes, yes, but what fertile fallacies and common misunderstanding can politicians use to acquire more power via exploiting the difference between the common person's flawed understanding due to cargo culting, cognitive biases, and/or outdated or inappropriate analogies vs actual reality? Is there any way we can get the AI to say give all political power to narrator is the solution to all problems and use the common person's mistaken worship of AI as a spiritual all knowing conscious being with unusual sensitivity and caring about everyone to cement that power? Certainly one of you eggheads can tweak that for me? What? It's against your ethics? We're trying to save the world here. Here, let me call up Bernie Sanders to propose nationalizing half your companies so we can do that.