Comment by axg11

4 years ago

Compression is a component of general intelligence. A few years ago I was very sceptical of machine learning ever leading to general intelligence. I've since changed my mind. There are a lot of parallels to this work and the concept of "embeddings" in machine learning.

Intelligence requires the ability to generalize. A prerequisite for generalization is the ability to take something high-dimensional and reduce it to a lower-dimensional representation to allow comparison and grouping of concepts.

We're doing this all the time. Take a pen for example: we're able to combine information from sight, touch, and sound. Through some mechanism, our brains reduce the multi-sensory information and create a consistent representation that is able to invoke past memories and knowledge about pens.

Our brains encode the embeddings in a very different way to deep learning neural networks, but the commonality is that both are able to compress data into a _useful_ representation. Note that as a result of this, the quality of the compression is important. Some forms of compression might be very efficient but they also tangle concepts together, resulting in loss of composability. The ideal compression (from an intelligence point of view) is both information efficient and maximally composable.

A nice definition of intelligence I've heard is exactly the ability to form models of the world with predictive power. And a model is essentially a compression of real-world data. Physical laws are a great example of this.

  • Creating models with predictive power is also a precise definition of science.

    • Can you recommend any philosophy of science (or life) treatises about this?

      I long considered myself a Popperian. A few years ago, I decided that I'm a "Predictionist" (a placeholder made up word until I learn better). I'm struggling to figure out what that even means.

      I still agree with Popper. Empathetically.

      I'm just tired of arguing. I forfeit. I give up. I no longer believe that discourse is helpful, that people are persuadable, that we can share Truth.

      Instead, I just want to know the predictive strength of someone's Truth.

      For example:

      The Earth is flat? Oh? Cool. Please, tell me, how does that Truth help me?

      1 reply →

    • Slight tweak to this imo: models that can predict which new reframings/samples of current scientific-community-consensus SOTAs/benchmarks/datasets will disprove contemporary consensus is science :)

  • I like to define intelligence as knowing data, but knowing data only creates idiot savants. What is lacking in AI today is artificial comprehension. What we're calling "artificial intelligence" lacks comprehension. Until the concepts handled by AI are composable, forming virtual operating logical mechanisms, and an AI comprehends by trial combinations of concepts (virtual operating logical mechanisms) we are only creating idiot savants incapable of comprehending what they do.

  • How do you tell if something you're trying to determine as intelligent or not has formed a model?

    • If it efficiently ingests data with a non-trivial signal-to-noise ratio and returns actions/reactions that contain more signal and less noise.

In the incredible story "Funes the Memorious" the eponymous Funes has an absolutely perfect memory, but is functionally mentally handicapped.

He can't even abstract to the existence of "trees" because he can recall and diff all of the details of every tree he's ever seen.

He can't even identify that he's seen a particular tree before, because he can diff how different it looked in a particular configuration of leaves and shadows because of different wind and cloud cover

I co-authored a paper exploring this topic few years ago, while I was pretty excited about the possiblity of using embeddings for generalization.

"Towards conceptual generalization in the embedding space" https://arxiv.org/abs/1906.01873

I still think the approach outlined in the paper (using embeddings to map the physical world) is sound especially for the field of self-driving which is in dire need of generalization, but I've since changed my mind and currently do not believe we can achieve AGI (ever).

While embeddings are a great tool for compressing information, they do not provide inherent mechanisms for manipulating the information stored in order to generalize and infer outcomes in new, unseen situations.

And even if we would start producing embeddings in a way where they would have some basic understanding of the physical world, we could never achieve it to the level of detail necessary - because physical world is not a discrete function. Otherwise we would be creating a perfect simulation (within a simulation?). And the last time I was playing God, was in "Populous".

  • > I've since changed my mind and currently do not believe we can achieve AGI (ever).

    Considering we (as in humans) developed general intelligence, isn't that already in contradiction with your statement? If it happened for us and is "easily" replicated through our DNA, it certainly can be developed again in an artificial medium. But the solution might not have anything to do with what we call machine learning today and sure we might go extinct before (but I didn't have the feeling that's what you were implying).

    • It is not a contradiction as I meant "achieving" in the context of creating it (through software).

      The fact it happened to us is undeniable (from our perspective), but the how/why of it is still one of the many mysteries of the universe - one we will likely never solve.

      18 replies →

    • It's semantics at this point but we did not create ourselves, it was a complex process that took billions of years to create each one of us. Something being conceivable isn't the same as it being practically possible. I can imagine what you propose, but the same goes for traveling to distant stars or a time machine for going to the future. All perfectly possible in theory.

      3 replies →

  • Thanks for your perspective. We’re still in disagreement but I wouldn’t bet on either side of the AGI debate with any significant conviction.

    Embeddings are very good at a few things: combining concepts (addition), untangling commonalities (subtraction) and determining similarity between concepts (distance).

    > While embeddings are a great tool for compressing information, they do not provide inherent mechanisms for manipulating the information stored

    What are the manipulations you’re referring to? I would love to learn more. From my understanding, embeddings actually provide great generalisation. If you have a well conditioned embedding space then you can interpolate into previously unseen parts of that space and still get sensible results. That is generalisation to me. Many current ML methods do _not_ result in a fully meaningful embedding space but my hunch is that we will get there with future insights and advances.

    • > We’re still in disagreement but I wouldn’t bet on either side of the AGI debate with any significant conviction.

      That is probably a superior position to hold. I am agnostic by nature, and interestingly this is one of the rare topics I've taken a hard position on. It could be a result of the years spent in the field but also some kind of bias.

      > What are the manipulations you’re referring to?

      Need to take a step back and mention that in the field of AI there is a great debate between symbolic and non-symbolic approaches. (and after decades spent with AI under symbolic approaches domination we are now in the golden age of non-symbolic AI; with symbolic starting to have a comeback. this podcast can be a good starting point to learn more https://lexfridman.com/gary-marcus/ - although I disagree with GM on many things - and this tweet for learning about symbolic making a comeback https://twitter.com/hardmaru/status/1470847417193209856)

      Basically embeddings are "non-symbolic AI" (which is great and this is where their huge potential stems from), but the very way they are generated and then later utlized is completely "symbolic". Which means the the limits of embeddings is defined by the limits of (in this case human written) symbols used to define them. Hope that makes sense.

  • > currently do not believe we can achieve AGI (ever).

    Do you mean with embeddings as the approach, or in general?

    • I think AGI will remain out of reach. Even a simpler thing like level 5 self-driving, which is only like 0.3 AGI or something, will remain forever out of reach no matter how much compute we throw at it (though I also think that if we ever reach 0.3 AGI we will also reach 100%).

      The reason is that the mundane world keeps surprising us everywhere we look and constantly keeps creating more questions than answers. Just look at the questions field of quantum mechanics is trying to tackle, but also every other field of research science - astronomy, genetics, biology, antropology even mathematics... Now imagine trying to keep up with all that - by writing code.

      Also, mastering these things would make us 'God'.

This also ties in to the cybernetic concept of the law of requisite variety, where adaptable entities need to be able to compress their sense-data about their environment into an internal model that corresponds in complexity to their need to act - this necessarily involves compression as the totality of reality is effectively infinite and can't fit between your ears.

There's also the Hutter prize that ties data compression directly to intelligence through Kolmagorov complexity.

Information and cybernetic theories cut pretty close to a general theory of intelligence in my opinion!

I find it funny how I can "see" a map of the world in front of me when I imagine it, but I totally cannot draw it.

Clearly, much less information is stored than the whole image... yet my mind DALL-E style fills in the gaps and "sees" a map.

  • Already plugged this book elsewhere in the thread, you might be interested in "The Mind is Flat". One chapter of the book explores the concept you're describing. Our brain creates the illusion of a "full picture" when often our imagination and internal representation is quite sparse. I think that's one of the key impressive qualities of our brains and general intelligence. We only do the minimum necessary imagination and computation. As we explore a particular concept or scene, our brains augment the scene with more details. In other words, our mind is making it up as we go along.

  • Keep drawing until what is on paper equals what is in your imagination. Seriously, try it.

    • Can you expand on this? Can you give an example of a kind of image it might work well with? I’ve always assumed the apparent detail of mental images was a kind of illusion, a bit like the illusion of detail outside the centre of the visual field.

      1 reply →

    • I found out the separation when I can read Chinese perfectly fine but can't write it. When I'm writing, or trying to write it, I employ the technique you described. Though more often than not, I'd have tiny parts missing here and there.

The human brain also forgets, something that may be a feature instead of a bug. Also, beyond compression––brains are simulation machines: imagining new scenarios. Curios to understand if ML provides anything analogous to simulation that isn't rote interpolation.

  • I think the simulation aspects of conscious and intelligence are fundamental. We don't simulate the world, we simulate what we might experience.

    • I don't think it's true. I can imagine a lot of aspects of systems around me I cannot possibly experience in any way, except maybe them leading to some outcome that I might experience as well. I sometimes do verify this experimentally, but that comes later.

      3 replies →

  • I am quite a novice in ML topics, but isn’t this concept of simultaneously training a generator and validator sort of this?

    I don’t know the exact term but I think of deep fake generators with an accompanying deep fake recognizer working in tandem bettering each other constantly?

  • People with hyperthymesia don‘t forget and don‘t necessarily seem to have any other potentially disabling neuroatypicality like autism.

    Having it is a premium feature.

    https://en.m.wikipedia.org/wiki/Hyperthymesia

    • >In fact, she was not very good at memorizing anything at all, according to the study published in Neurocase.[1] Hyperthymestic individuals appear to have poorer than average memory for arbitrary information.

      So no, not premium. A trade off.

  • Absolutely. Generative methods are all the rage now. Those methods work on learning information-rich representation spaces. You could argue it's still "interpolation" but instead of interpolating in data-space per se you are interpolating in representation-space.

I saw a presentation once (It was not recorded, so I can't link it) that said the difference between real intelligence and artificial intelligence is the ability to quickly learn.

As an example, he said imagine if he invented a word. Poditon. And he told us that a poditon is any object that can fit on a podium. Instantly, you know whether or not any object can be classified as a poditon. A laptop is a poditon, but a car is not.

We are not at the stage where we can just tell a program "Any object that can fit on a podium is a poditon" and then ask "Is X a poditon?" and get a correct answer. And we probably won't be there for another couple decades.

  • So, I tried this out with GPT-3 in the OpenAI Playground. (The model called text-davinci-002.) My prompt looked like this:

        Definition: A blorple is anything that is red and more or less round.
        Question: Is a tennis ball a blorple?
        Answer: No, because although tennis balls are round they aren't red.
        Question: Is a cherry a blorple?
        Answer: Yes, because cherries are red and approximately round.
    
        Definition: A poditon is anything that can fit on top of a podium.
        Question: Is a laptop computer a poditon?
        Answer:
    

    GPT-3 says: "Yes, because laptop computers are small enough to fit on top of a podium."

    Is a normal-sized automobile a poditon? "No, an automobile is too large to fit on top of a podium."

    Is the sun a poditon? "No, because the sun is too large to fit on top of a podium."

    Is a human being a poditon? "Yes, because human beings are small enough to fit on top of a podium."

    Is a house a poditon? "No, because a house is too large to fit on top of a podium."

    While generating those answers it also spontaneously answered the question for tennis balls (yes) and books (yes).

    Decades sure do go by quickly, these days.

  • > We are not at the stage where we can just tell a program "Any object that can fit on a podium is a poditon" and then ask "Is X a poditon?" and get a correct answer. And we probably won't be there for another couple decades.

    If that presenter actually said that, they need to take a look at "Few shot learning in language models" (just Google the term and start reading the papers).

> A prerequisite for generalization is the ability to take something high-dimensional and reduce it to a lower-dimensional representation to allow comparison and grouping of concepts.

I've been thinking that it might actually the other way around - intelligence is taking lower dimensional data and being able to infer higher level representation in terms of context, meanings and other abstractions. I.e. understanding when a stop sign isn't a stop sign.

  • It's actually the compression that forces it to learn higher level concepts.

    In your stop sign example, say we are trying to teach a visual model the difference between toy stop signs and real stop signs.

    To train it you feed it a 3D model of the world and the actions a person takes in response (ie, ignoring toy stop signs but stopping for real ones). Once the embedding is well trained (with lots of data) if you then run it through something like UMAP to reduce the number of dimensions in the embedding from hundreds to 2 or 3 you'll see it has "discovered" the concept of "scale" - all the small toy stop signs will be clustered together and the real ones clustered elsewhere.

    That generalisation forced by compression is where the abstraction of "scale" comes from.

    (Of course in real life you'd use a more complex model than just an embedding for this, but in principle this is the idea).

How do theories such as "The 100th Monkey" as well as transferred information via DNA to offspring translate to ML|AI at all?

For example, couldn't a sufficiently developed AI modify some code/libraries it utilizes/learns from/creates, to ensure any new spawns of said AI/ML/Bot has the learned previous behaviors?

I doubt 100th Monkey will ever hit AI.

So that's an interesting aspect to the limits to AI 'evolving'

I’m bilingual from moving countries at a young age and now a lot of my early childhood memories are in the “wrong” language

I think compression is a bad word or description. Another definition of intelligence is sometimes to differentiate essential from superficial information. Of course that often aligns with the application of compression of information.

>The ideal compression (from an intelligence point of view) is both information efficient and maximally composable

This leads the compression to be overfitted to the learning environment. Not that our intelligence is entirely immune to that either

>able to compress data into a _useful_ representation. Note that as a result of this, the quality of the compression is important. Some forms of compression might be very efficient but they also tangle concepts together, resulting in loss of composability

---

I wonder if various factors inform how/what compression is used on a memory...

For example, a memory of putting the object back where it belongs/got it from vs the memory of a violent attack is through the lens of emotional (trauma) and thus the memories will be stored differently.

Its interesting in that I have been wanting to post an ASK HN on memory and dreams...

Now with this post, and your comment, I will post that.

---

The idea is that the surunding meta-information of a memory is important.

Lenses of senses that colour a memory are many, and individualistic.

i.e.

A person who is a psychopath, has an emotional block on the lens that they would see their actions through (remorse, guilt, empathy, etc) - thus they may not recall or RE-MIND themselves of an action/situation.

A memory that is laid with a sensuous experience, such as sex with someone you love/lust deeply may last a lifetime.

Certain things that one does/says can also lead to a lifetime of regret ; a cringe-worthy action/comment from decades ago can still haunt your thoughts.

---

I think the mystique btwn ML and biological memories is a really interesting space, as an ML|AI based system will never achieve the 100th monkey or DNA|biological transfer of information, but an approximation/facsimile based on evolved|updated libraries/files/code which are maintained exclusively by the AI entity will/does exist

  • Speculating here: if the brain really uses embeddings similar (in concept) to neural network embeddings, the mechanism could explain a lot of the peculiarities of the brain. Embeddings are naturally entangled, so are memories. For example, a specific smell can evoke a previous memory.

“Compression is a component of general intelligence.”

if you say so

“A few years ago I was very sceptical of machine learning ever leading to general intelligence. I've since changed my mind.”

thanks for sharing

“…both [human brains and artificial neural networks] are able to compress data into a _useful_ representation.”

thanks for pointing this out

  • Could you please stop posting unsubstantive and/or flamebait comments to HN? You've been doing it a ton, unfortunately, and we have ban that sort of account. It's not what this site is for, and it destroys what it is for.

    What we want here is thoughtful, curious conversation, not people bashing each other's comments or inflicting snark on each other or ideological talking points.

    If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.