Comment by noosphr

20 hours ago

I'm kind of stunned that someone is using my work to tell me I'm wrong. I wrote the code for the dish brain pong and encoding information was a huge part of what that experiment was about.

So when I way that the grok paper and the pong paper fundamentally agree I have some idea of what I'm talking about.

9 comments

noosphr

anon84873628 18 hours ago

If you're going to claim the tokenizer is a dictionary then it doesn't really matter what paper you wrote code for.

benlivengood 19 hours ago

I might have misunderstood the point you are making. I read the original article as "weights are like meat", and so I'm confused by what you consider fractally wrong.

noosphr 19 hours ago
The point that when the rules the model learns are simple enough they stop being spread out over all the layers and become as easily interpretable as any expert system.
It's just that the rules we feed in the model are extremely poorly defined and we end up with the soup of disjoint rules smeared all across the weights.
This isn't a feature of the models. It's a feature of the training set.
Being shocked that you can store rules in floating point numbers is the same as being shocked you can store rules in integers. It's been a century since Goedel Numbering was invented, we should be used to it by now.
- simonh 19 hours ago
  
  Right, but all of that is still in the weights. The point of the article/joke isn’t literally that there is no grammar, it’s that there is no grammar separate from the weights. It’s all in the weights. And yes, it’s absurd. It’s a joke, but a thought provoking one.
- throwaway173738 17 hours ago
  
  So basically there are rules, we just can’t articulate them and so we can’t decode them from the weights. The Goedel Numbering metaphor is pretty appealing to me. You can represent any finite series of real numbers with a series of computations performed on some other finite series of real numbers. We just happen to be using matrices because the math is easy to parallelize. The trick is to realize that when you know the sequence you have and the sequence you want then you can compute the calculations. If you constrain the calculations to only matrix multiplication then you arrive at the scheme we have.
  
  2 replies →

js2 19 hours ago

https://news.ycombinator.com/item?id=35079

ufocia 19 hours ago

Hubris much? I don't see a necessary contradiction in using someone's work to disprove another aspect of that same person's work.