← Back to context

Comment by og_kalu

5 days ago

We do not write the code that makes it do what it does. We write the code that trains it to figure out how to do what it does. There's a big difference.

The code that builds the models and performance inference from it is code we have written. The data in the model is obviously the big trick. But what I'm saying is that if you run inference, that alone does not give it super-powers over your computer. You can write some agentic framework where it WOULD have power over your computer, but that's not what I'm referring to.

It's not a living thing inside the computer, it's just the inference building text token by token using probabilities based on the pre-computed model.

  • > It's not a living thing inside the computer, it's just the inference building text token by token using probabilities based on the pre-computed model.

    Sure, and humans are just biochemical reactions moving muscles as their interface with the physical word.

    I think the model of operation is not a good criticism, but please see my reply to the root comment in this thread where I detail my thoughts a bit.

  • You cannot say, 'we know it's not thinking because we wrote the code' when the inference 'code' we wrote amounts to, 'Hey, just do whatever you figured out during training okay'.

    'Power over your computer', all that is orthogonal to the point. A human brain without a functioning body would still be thinking.

    • Well, a model by itself with data that emits a bunch of human written words is literally no different than what JIRA does when it reads a database table and shits it out to a screen, except maybe a lot more GPU usage.

      I permit you, that yes, the data in the model is a LOT more cool, but some team could by hand, given billions of years (well probably at least 1 Octillion years), reproduce that model and save it to a disk. Again, no different than data stored in JIRA at that point.

      So basically if you have that stance you'd have to agree that when we FIRST invented computers, we created intelligence that is "thinking".

      5 replies →

  • This is a bad take. We didn't write the model, we wrote an algorithm that searches the space of models that conform to some high level constraints as specified by the stacked transformer architecture. But stacked transformers are a very general computational paradigm. The training aspect converges the parameters to a specific model that well reproduces the training data. But the computational circuits the model picks out are discovered, not programmed. The emergent structures realize new computational dynamics that we are mostly blind to. We are not the programmers of these models, rather we are their incubators.

    As far as sentience is concerned, we can't say they aren't sentient because we don't know the computational structures these models realize, nor do we know the computational structures required for sentience.

    • However there is another big problem, this would require a blob of data in a file to be labelled as "alive" even if it's on a disk in a garbage dump with no cpu or gpu anywhere near it.

      The inference software that would normally read from that file is also not alive, as it's literally very concise code that we wrote to traverse through that file.

      So if the disk isn't alive, the file on it isn't alive, the inference software is not alive - then what are you saying is alive and thinking?

      4 replies →

I think the discrepancy is this:

1. We trained it on a fraction of the world's information (e.g. text and media that is explicitly online)

2. It carries all of the biases us humans have and worse the biases that are present in the information we chose to explicitly share online (which may or may not be different to the experiences humans have in every day life)

  • > It carries all of the biases us humans have and worse the biases that are present in the information we chose to explicitly share online

    This is going to be a huge problem. Most people assume computers are unbiased and rational, and increasing use of AI will lead to more and larger decisions being made by AI.

  • I see this a lot in what LLMs know and promote in terms of software architecture.

    All seem biased to recent buzzwords and approaches. Discussions will include the same hand-waving of DDD, event-sourcing and hexagonal services, i.e. the current fashion. Nothing of worth apparently preceded them.

    I fear that we are condemned to a future where there is no new novel progress, but just a regurgitation of those current fashion and biases.

and then the code to give it context. AFAIU, there is a lot of post training "setup" in the context and variables to get the trained model to "behave as we instruct it to"

Am I wrong about this?