Comment by kesor

3 months ago

Binary bits are also a language. A structured language that transistor-based computers execute into some result we humans find valuable. Why wouldn't a model be able to write these binary instructions directly? Why do we need all these layers in between? We don't.

4 comments

kesor

didibus 3 months ago

Because the learned function to generate binary code is likely more complex than that for Python.

I admit I can't say for sure until we try it. If someone were to train a model at the same scale on the same amount of raw binary code as we do these models on raw language and code, would it perform better at generating working programs. Thing is, it would now fail to understand human language prompts.

From what I know and understand though, it seems like it would be more complex to achieve.

My meta point is, you shouldn't think of it as what would a computer most likely understand, because we're not talking about a CPU/GPU. You have to think, what would a transformer architecture deep neural net better learn and infer? Python or binary code? And I think from that lens it seems more likely it's Python.

kesor 3 months ago
Why would you think its more complex? There are less permutations of generating transistor on/off states than there are all the different programming languages in use that result in the exact same bits.
Who said that creating bits efficiently from English to be computed by CPUs or GPUs must be done with transformer architecture? Maybe it can be, maybe there are other ways of doing it that are better. The AI model architecture is not the focus of the discussion. It is the possibilities of how it can look like if we ask for some computation, and that computation appears without all the middle-men layers we have right now, English->Model->Computation, not English->Model->DSL->Compiler->Linker->Computation.
- didibus 3 months ago
  
  > Why would you think its more complex?
  Binary code takes more space, and both training and inference is highly capped by memory and context sizes.
  Models tokenize to a limited set of tokens, and then learn relations between those. I can't say for sure, but I feel it be more challenging to find tokenization schemes for binary code and learn their relationships.
  The model needs to first learn human language really well, because it has to understand the prompt and map it accurately to the binary code. That means the corpus will need to include a lot of human languages that it learns and also binary code, I wonder if the fact they differ so much would conflict the learning.
  I think coming up with a corpus of mapped human language to binary code will be really challenging. Unless we can include the original code's comments at appropriate places around the binary code and so on.
  Binary code is machine dependent, so it would result in programs that aren't portable between architecture and operating system and so on. The model would need to learn more than one binary code and be able to accurately generate the same program for different target platforms and OS.
  > Who said that creating bits efficiently from English to be computed by CPUs or GPUs must be done with transformer architecture?
  We've never had any other method ever do as well and by a magnitude. We may invent a whole new way in the future, but as of now, it's the absolute best method we've ever figured out.
  > The AI model architecture is not the focus of the discussion. It is the possibilities of how it can look like if we ask for some computation, and that computation appears without all the middle-men layers we have right now, English->Model->Computation, not English->Model->DSL->Compiler->Linker->Computation.
  Each layer simplifies the task of the layer above. These aren't like business layer that take a cut of the value out at each level, software layers remove complexity from the layers above.
  I don't know why we wouldn't be talking about AI models? Isn't the topic that it may be more optimal for an AI model to be trained on binary code directly and to generate binary code directly? At least it's what I was talking about.
  So if I stick to AI models. With LLMs and image/video diffusion and such, we've already observed that inference through smaller steps and chains of inference work way better. Based on that, I feel it's likely going from human language to binary code in a single hop to also work worse.
  
  1 reply →