← Back to context

Comment by HarHarVeryFunny

6 months ago

Assuming you want to define the goal, "AGI", as something functionally equivalent to part (or all) of the human brain, there are two broad approaches to implement that.

1) Try to build a neuron-level brain simulator - something that is a far distant possibility, not because of compute, but because we don't have a clear enough idea of how the brain is wired, how neurons work, and what level of fidelity is needed to capture all the aspects of neuron dynamics that are functionally relevant rather than just part of a wetware realization

OR

2) Analyze what the brain is doing, to extent possible given our current incomplete knowledge, and/or reduce the definition of "AGI" to a functional level, then design a functional architecture/implementation, rather than neuron level one, to implement it

The compute demands of these two approaches are massively different. It's like the difference between an electronic circuit simulator that works at gate level vs one that works at functional level.

For time being we have no choice other than following the functional approach, since we just don't know enough to build an accurate brain simulator even if that was for some reason to be seen as the preferred approach.

The power efficiency of a brain vs a gigawatt systolic array is certainly dramatic, and it would be great for the planet to close that gap, but it seems we first need to build a working "AGI" or artificial brain (however you want it define the goal) before we optimize it. Research and iteration requires a flexible platform like GPUs. Maybe when we figure it out we can use more of a dataflow brain-like approach to reduce power usage.

OTOH, look at the difference between a single user MOE LLM, and one running in a datacenter simultaneously processing multiple inputs. In the single-user case we conceptualize the MOE as saving FLOPs/power by only having one "expert" active at a time, but in the multi-user case all experts are active all the time handling tokens from different users. The potential of a dataflow approach to save power may be similar, with all parts of the model active at the same time when handling a datacenter load, so a custom hardware realization may not be needed/relevant for power efficiency.

Or

3) Pour enough computation into a sufficiently capable search process and have it find a solution for us

Which is what we're doing now.

The bitter lesson was proven right once again. LLMs prove that you can build incredibly advanced AIs without "understanding" how they work.

  • You could do an architectural search, and Google previously did that for CNNs with it's NASNet (Network Architectural Search) series of architectures, but the problem is you first need to decide what are the architectural components you want your search process to operate over, so you are baking in a lot of assumptions from the start and massively reducing the search space (because this is necessary to be computationally viable).

    A search or evolutionary process would also need an AGI-evaluator to guide the search, and this evaluator would then determine the characteristics of the solution found, so it rather smacks of benchmark gaming rather than the preferred approach of designing for generic capabilities rather than specific evaluations.

    I wouldn't say we don't know how LLMs "work" - clearly we know how the transformer itself works, and it was designed intentionally with certain approach in mind - we just don't know all the details of what representations it has learnt from the data. I also wouldn't say LLMs/transformers represent a bitter lesson approach since the architecture is so specific - there is a lot of assumptions baked into it.

Hard problem of consciousness seems way harder to wolve than the easy one which is a purely engineering problem. People have been thinking about why the brain thinks for a very long time and so far we have absolutely no idea.

  • > People have been thinking about why the brain thinks for a very long time and so far we have absolutely no idea

    I'm not sure what you mean by this.

    I think there is a pretty large consensus that our neocortex is a prediction machine (predicting future observations/outcomes from past experience), and the reason WHY it would have evolved to be this is because there is obvious massive survival benefit in successfully predicting how predators and prey will react ahead of time, what will be the outcome of your own actions, etc, etc. Prediction unlocks you from being stuck in the present having to react to things as they happen and lets you plan ahead.

    Thinking = Reasoning/Planning is just multi-step prediction.

    I don't think consciousness is the big deal most people think it is - it seems to be just the ability to self-observe (which helps to self-predict), but if we somehow built AGI that wasn't conscious, then who cares?

    • Not why it was created, why the systems in the brain lead to consciousness. Your option B requires understanding not just mechanically how but the fundamental reason for why consciousness appears. If we just understand the mechanics all we can confidently do is work toward a more and more accurate representation of the brain. AGI without consciousness is speculated but hard for me to believe in.

      11 replies →