Comment by IgorPartola

6 months ago

It is ultimately a hardware problem. To simplify it greatly, an LLM neuron is a single input single output function. A human brain neuron takes in thousands of inputs and produces thousands of outputs, to the point that some inputs start being processed before they even get inside the cell by structures on the outside of it. An LLM neuron is an approximation of this. We cannot manufacture a human level neuron to be small and fast and energy efficient enough with our manufacturing capabilities today. A human brain has something like 80 or 90 billion of them and there are other types of cells that outnumber neurons by I think two orders of magnitude. The entire architecture is massively parallel and has a complex feedback network instead of the LLM’s rigid mostly forward processing. When I say massively parallel I don’t mean a billion tensor units. I mean a quintillion input superpositions.

And the final kicker: the human brain runs on like two dozen Watts. An LLM takes a year of running on a few MW to train and several KW to run.

Given this I am not certain we will get to AGI by simulating it in a GPU or TPU. We would need a new hardware paradigm.

A bee is an autonomous walking, climbing, and flying drone that investigates its environment, collects resources, builds structures, and coordinates with other drones.

We're totally incapable of building an AI that can do anything resembling that. We're still at the phase where robots walking on rough terrain without falling over remains a bit impressive.

I doubt the limitation is that we can't produce enough raw compute to replace a single bee.

  • I’m in agreement with you in that raw compute isn’t really the only missing piece, but I’m not in agreement we have enough compute to fully simulate even simple insect brains

    • I feel like we are talking past one another here. I disagree that all the computer processors in the world combined don't have enough raw processing power to simulate a single bee brain. That to me is an absurd idea.

      Now if you meant we don't know enough about bees to actually model what its brain does and create a processor to actually process an artificial bee brain, then yes, we don't have that ability yet.

      ----

      Aside, when I watched Netflix show black mirror such as the grain episode, I always kept getting stuck at how are they lowering this tiny device... What kind of battery technology works here was my question even though this is science fiction.

      9 replies →

  • I think the problem here is the physical hardware that will navigate and collect information from this environment? In this case, a biological robot makes more sense than a mechanical/electronic one. If you go down that route, though, the best AI will be a human brain. We have been training and selecting these for quite a while now.

On the other hand, a large part of the complexity of human hardware randomly evolved for survival and only recently started playing around in the higher-order intellect game. It could be that we don't need so many neurons just for playing intellectual games in an environment with no natural selection pressure.

Evolution is winning because it's operating at a much lower scale than we are and needs less energy to achieve anything. Coincidentally, our own progress has also been tied to the rate of shrinking of our toys.

  • Evolution has won so far because it had a four billion year head start. In two hundred years, technology has gone from "this multi-ton machine can do arithmetic operations on large numbers several times faster than a person" to "this box produces a convincing facsimile of human conversation, but it only emulates a trillion neurons and they're not nearly as sophisticated as real ones."

    I do think we probably need a new hardware approach to get to the human level, but it does seem like it will happen in a relative blink of an eye compared to how long the brain took.

    • > Evolution has won so far because it had a four billion year head start. In two hundred years, technology has gone from

      I dunno, whenever I leave the silicon technology alone with plenty of power and cooling, nothing changes. :p

      If the effect requires the involvement of swarms of ancient nanobots, then maybe that's the hardware and software that really deserves the credit.

    • But we don't even need a human brain. We already have those, they take months to grow, take forever to train, and are forever distracted. Our logic-based processes will keep getting smaller and less power hungry as we figure out how to implement them at even lower scales, and eventually we'll be able to solve problems with the same building blocks as evolution but in intelligent ways, of which LLMs will likely only play a minuscule part of the larger algorithms.

      8 replies →

To be fair to the raw capabilities of the semiconductor industry, a 100mm^2 die at 3nm can contain on the order of 1~10 trillion features. I don't know that we are actually that far off in terms of scale. How to arrange these features seems to be the difficult part.

The EDA [0] problem is immune to the bitter lesson. There are certainly specific arrangements of matter that can solve this problem better than a GPU/TPU/CPU can today.

[0] https://en.wikipedia.org/wiki/Electronic_design_automation

This is a great summary! I've joked with a coworker that while our capabilities can sometimes pale in comparison (such as dealing with massively high-dimensional data), at least we can run on just a few sandwiches per day.

  • One sandwich is about the energy equivalent of running about two modern desktop CPU’s (flat out) for about an hour.

> To simplify it greatly, an LLM neuron is a single input single output function. A human brain neuron takes in thousands of inputs and produces thousands of outputs

This is simply a scaling problem, eg. thousands of single I/O functions can reproduce the behaviour of a function that takes thousands of inputs and produces thousands of outputs.

Edit: As for the rest of your argument, it's not so clear cut. An LLM can produce a complete essay in a fraction of the time it would take a human. So yes, a human brain only consumes about 20W but it might take a week to produce the same essay that the LLM can produce in a few seconds.

Also, LLMs can process multiple prompts in parallel and share resources across those prompts, so again, the energy use is not directly comparable in the way you've portrayed.

  • > This is simply a scaling problem, eg. thousands of single I/O functions can reproduce the behaviour of a function that takes thousands of inputs and produces thousands of outputs.

    I think it's more than just scaling, you need to understand the functional details to reproduce those functions (assuming those functions are valuable for the end result as opposed to just the way it had to be done given the medium).

    An interesting example of this neuron complexity that was published recently:

    As rats/mice (can't remember which) are exposed to new stimuli, the axon terminals of a single neuron do not all transmit a signal when there is an action potential, they transmit in a changing pattern after each action potential and ultimately settle into a more consistent pattern of some transmitting and some not.

    IMHO: There is interesting mathematical modeling and transformations going on in the brain that is the secret sauce for our intelligence and it is yet to be figured out. It's not just scaling of LLM's, it's finding the right functions.

    • Yes, there may be interesting math, but I didn't mean "scaling LLMs", necessarily. I was making a more general point that a single-I/O function can pretty trivially replicate a multi-I/O function, so the OP's point that "LLM neurons" are single-I/O and bio neurons are multi-I/O doesn't mean much. Estimates of brain complexity have already factored this in, which is why we know we're still a few orders of magnitude away from the number of parameters needed for a human brain in a raw compute sense.

      However, the human brain has extra parameters that a pure/distilled general intelligence may not actually need, eg. emotions, some types of perception, balance, and modulation of various biological processes. It's not clear how many of the parameters of the human brain these take up, so maybe we're not as far as we think.

      And there are alternative models such as spiking neural networks which more closely mimic biology, but it's not clear whether these are really that critical. I think general intelligence will likely have multiple models which achieve similar results, just like there are multiple ways to sort a set of numbers.

      1 reply →

  • I agree with both of you, but scaling isn't feasible with this paradigm. You could need continent-sized hardware to approximate general intelligence with the current paradigm.

    • > You could need continent-sized hardware to approximate general intelligence with the current paradigm.

      I doubt it, if by "current paradigm" you mean the hardware and general execution model, eg. matrix math. Model improvements from progress in algorithms have been outpacing performance improvements from hardware progress for decades. Even if hardware development stopped today, models will continue improving exponentially.

> We would need a new hardware paradigm.

It's not even that. The architecture(s) behind LLMs are nowhere near close that of a brain. The brain has multiple entry-points for different signals and uses different signaling across different parts. A brain of a rodent is much more complex than LLMs are.

  • LLM 'neurons' are not single input/single output functions. Most 'neurons' are Mat-Vec computations that combine the products of dozens or hundreds of prior weights.

    In our lane the only important question to ask is, "Of what value are the tokens these models output?" not "How closely can we emulate an organic bran?"

    Regarding the article, I disagree with the thesis that AGI research is a waste. AGI is the moonshot goal. It's what motivated the fairly expensive experiment that produced the GPT models, and we can look at all sorts of other hairbrained goals that ended up making revolutionary changes.

    • > "Of what value are the tokens these models output?" not "How closely can we emulate an organic bran?"

      Then you build something that is static and does not learn. This is as far from AI as you can get. You're just building a goofy search engine.

"To simplify it greatly, an LLM neuron is a single input single output function". This is very wrong unless I'm mistaken. A synthetic neuron is multiple input single output.

  • Ten thousands of extremely complex analog inputs, one output with several thousand of targets that MIGHT receive the output with different timing and quality.

    One neuron is ufathomably complex. It‘s offensive to biology to call a cell in a mathematical matrix neuron.

It's even worse than number of input/outputs, number of neurons, efficiency or directional feedback.

The brain also has plasticity! The connections between neurons change dynamically - an extra level of meta.

  • Connections between LLM neurons also change during training.

    • a) "during training" is a huuuuge asterisk

      b) Do you have a citation for that? my understanding is that while some weights can go to zero and effectively be removed, no (actually used in prod) network architecture or training method allows arbitrary connections.

“And the final kicker: the human brain runs on like two dozen Watts. An LLM takes a year of running on a few MW to train and several KW to run.”

I’ve always thought about nature didn’t evolve to use electricity as its primary means of energy. Instead it uses chemistry. It’s quite curious, really.

Like a tiny insect is chemistry powered. It doesn’t need to recharge batteries, it needs to eat and breathe oxygen.

What if our computers started to use biology and chemistry as their primary energy source?

Or will it be the case that in the end using electricity as the primary energy source is more efficient for “human brain scale computation”, it’s just that nature didn’t evolve that way…

  • "Wetware" as it were , i remeber some research article some time back where they grew some kind of 'brainlets' and had them functional as far as memory was concerned[1]. Would be a interesting to see how that tech would progress while incororatd in the current silicon/photonic devices for input/output. Downside would be the durability of the organic matter and replacement.

    [1] https://sciencesensei.com/scientists-created-thinking-brain-...

    • >Downside would be the durability of the organic matter and replacement.

      thinking about the consequences of consumer grade "organic computing" gets weird really fast. How do you interface biological matter with peripherals? What about toxicity? What about pathogens? Not only as targets, but as vectors too. What about senescence? Would my computer catch a cold or get Alzheimer's? What about energy? Would I have to buy proteins/sugar for my PC? Would a "beefy PC master race" kind of machine big enough to gain sentience? Would my PC need to literally sleep?!

      Funny to think about it

      1 reply →

Minor correction here. You are correct about hardware being an issue, but the magnitude is much greater. You have a lot more than "thousands" of inputs. In the hand alone you have ~40,000+ tactile corpuscles (sensing regions). And that's just one mode. The eye has ~7 million cones and 80 million rods. There is processing and quantization performed by each of those cells and each of the additional cells those signal, throughout the entire sensory-brain system. The amount of data the human brain processes is many orders of magnitude greater than even our largest exascale computers. We are at least 3 decades from AGI if we need equivalent data processing as the human brain, and that's optimistic.

Like you mention, each individual neuron or synapse includes fully parallel processing capability. With signals conveyed by dozens of different molecules. Each neuron (~86 billion) holds state information in addition to processing. The same is true for each synapse (~600 quadrillion). That is how many ~10 Hz "cores" the human computational system has.

The hubris of the AI community is laughable considering the biological complexity of the human body and brain. If we need anywhere close to the same processing capability, there is no doubt we are multiple massive hardware advances away from AGI.

  • I agree with this up till saying we must be very far from AGI. I don't think we're close, but the scale of human inputs doesn't tell us anything about it. A useful AGI need not be capable of human level cognition, and human level cognition need not require the entire human biological or nervous systems - we're a product of millions of years of undirected random evolution, optimized to run a fleshy body and survive African plains predators. This whole thing we do of thinking and science and engineering is a quirk that made us very adaptable, but how much of what we are is required to implement it isn't clear (i.e. a human minus a hand can still understand advanced mathematics, there are blind programmers etc.)

    • I'm pretty sure human level cognition requires human level processing power. We are still multiple orders of magnitude away from that.

      A blind programmer still has human processing power. The "usually-sight" regions of the brain don't just shut down. They're still used.

      2 replies →

Med resident here: AFAIK the 80-90 billion neuron is misleading: more than 80% of them are in the cerebellum and are mostly a low pass filter for motor signals. People born with no cerebellum are of normal intelligence. And we don't know how much of the neocortex is actually useful for consciousness but apparently a minority of it.

I wrote a concrete expected‑value model for AGI that anchors rewards in the 15–30T USD Western white‑collar payroll, adds spillovers on 60T GDP, includes transition costs, and varies probability explicitly. Three scenarios (optimistic, mid, pessimistic) show when the bet is rational versus value‑destroying—no mysticism, just plug‑and‑play numbers. If you’re debating AGI’s payoff, benchmark it against actual payroll and GDP, not vibes.

Read: https://pythonic.ninja/blog/2025-11-15-ev-of-agi-for-western...

it is an architecture problem, too. LLMs simply aren't capable of AGI

  • Why not?

    A lot of people say that, but no one, not a single person has ever pointed out a fundamental limitation that would prevent an LLM from going all the way.

    If LLMs have limits, we are yet to find them.

    • We have already found limitations of the current LLM paradigm, even if we don't have a theorem saying transformers can never be AGI. Scaling laws show that performance keeps improving with more params, data + compute but only following a smooth power law with sharply diminishing returns. Each extra order of magnitude of compute buys a smaller gain than the last, and recent work suggests we're running into economic and physical constraints on continuing this trend indefinitely.

      OOD is still unsolved problem, they basically struggle under domain shifts and long tail cases or when you try systematically new combinations of concepts (especially on reasoning heavy tasks). This is now a well documented limitation of LLMs/multimodal LLMs.

      Work on COT faithfulness shows that the step by step reasoning they print doesn't match their actual internal computation, they frequently generate plausible but misleading explanations of their own answers (lookup anthropic paper). That means they lack self knowledge about how/why they got a result. I doubt you can get AGI without that.

      None of this proves that no LLM based architecture could ever reach AGI. But it directly contradicts the idea that we haven't found any limits. We've already found multiple major limitations of the current LLMs, and there's no evidence that blindly scaling this recipe is enough to cross from very capable assistant to AGI.

      5 replies →

    • LLMs are bounded by the same bounds computers are. They run on computers so a prime example of a limitation is Rices theorem. Any ‘ai’ that writes code is unable (just like humans) to determine if the output is or is not error free.

      This means a multi agent workflow without human that writes code may or may not be error free.

      LLMs are also bounded by runtime complexity. Could an llm find the shortest Hamiltionian path between two cities in non polynomial time?

      LLMs are bounded by in model context: Could an llm create and use a new language with no context in its model?

Assuming you want to define the goal, "AGI", as something functionally equivalent to part (or all) of the human brain, there are two broad approaches to implement that.

1) Try to build a neuron-level brain simulator - something that is a far distant possibility, not because of compute, but because we don't have a clear enough idea of how the brain is wired, how neurons work, and what level of fidelity is needed to capture all the aspects of neuron dynamics that are functionally relevant rather than just part of a wetware realization

OR

2) Analyze what the brain is doing, to extent possible given our current incomplete knowledge, and/or reduce the definition of "AGI" to a functional level, then design a functional architecture/implementation, rather than neuron level one, to implement it

The compute demands of these two approaches are massively different. It's like the difference between an electronic circuit simulator that works at gate level vs one that works at functional level.

For time being we have no choice other than following the functional approach, since we just don't know enough to build an accurate brain simulator even if that was for some reason to be seen as the preferred approach.

The power efficiency of a brain vs a gigawatt systolic array is certainly dramatic, and it would be great for the planet to close that gap, but it seems we first need to build a working "AGI" or artificial brain (however you want it define the goal) before we optimize it. Research and iteration requires a flexible platform like GPUs. Maybe when we figure it out we can use more of a dataflow brain-like approach to reduce power usage.

OTOH, look at the difference between a single user MOE LLM, and one running in a datacenter simultaneously processing multiple inputs. In the single-user case we conceptualize the MOE as saving FLOPs/power by only having one "expert" active at a time, but in the multi-user case all experts are active all the time handling tokens from different users. The potential of a dataflow approach to save power may be similar, with all parts of the model active at the same time when handling a datacenter load, so a custom hardware realization may not be needed/relevant for power efficiency.

  • Or

    3) Pour enough computation into a sufficiently capable search process and have it find a solution for us

    Which is what we're doing now.

    The bitter lesson was proven right once again. LLMs prove that you can build incredibly advanced AIs without "understanding" how they work.

    • You could do an architectural search, and Google previously did that for CNNs with it's NASNet (Network Architectural Search) series of architectures, but the problem is you first need to decide what are the architectural components you want your search process to operate over, so you are baking in a lot of assumptions from the start and massively reducing the search space (because this is necessary to be computationally viable).

      A search or evolutionary process would also need an AGI-evaluator to guide the search, and this evaluator would then determine the characteristics of the solution found, so it rather smacks of benchmark gaming rather than the preferred approach of designing for generic capabilities rather than specific evaluations.

      I wouldn't say we don't know how LLMs "work" - clearly we know how the transformer itself works, and it was designed intentionally with certain approach in mind - we just don't know all the details of what representations it has learnt from the data. I also wouldn't say LLMs/transformers represent a bitter lesson approach since the architecture is so specific - there is a lot of assumptions baked into it.

  • Hard problem of consciousness seems way harder to wolve than the easy one which is a purely engineering problem. People have been thinking about why the brain thinks for a very long time and so far we have absolutely no idea.

    • > People have been thinking about why the brain thinks for a very long time and so far we have absolutely no idea

      I'm not sure what you mean by this.

      I think there is a pretty large consensus that our neocortex is a prediction machine (predicting future observations/outcomes from past experience), and the reason WHY it would have evolved to be this is because there is obvious massive survival benefit in successfully predicting how predators and prey will react ahead of time, what will be the outcome of your own actions, etc, etc. Prediction unlocks you from being stuck in the present having to react to things as they happen and lets you plan ahead.

      Thinking = Reasoning/Planning is just multi-step prediction.

      I don't think consciousness is the big deal most people think it is - it seems to be just the ability to self-observe (which helps to self-predict), but if we somehow built AGI that wasn't conscious, then who cares?

      12 replies →

Correct - the vast majority of people vastly underestimate the complexity of the human brain and the emergent properties that develop from this inherent complexity.

>It is ultimately a hardware problem.

I think it's more an algorithm problem. I've been reading how LLMs work and the brain does nothing like matrix multiplication over billions of entities. It seems a very inefficient way to do it in terms of compute use, although efficient in terms of not many lines of code. I think the example of the brain shows one could do far better.

exactly, the brain - what a concept! over here you have broca's area, there, wernicke, then Bowman's crest, sector 19, and undiscovered country.

if you put the brain in the shape of a tube you'd have a really long err, well, let's say it's not a good idea to do that. the brain gives me goosepimples, my brain too

Humans grow over years with plenty of self guided study. It's far more than a hardware problem.

Quantum compute is my guess. Being able to switch entire models at atomic speeds will give the perception of intelligence at least. There is still a lot there that will need to be figured out between now and then.

That's my non-expert belief as well. We are trying to brute force an approximation of one aspect of how neurons work at great cost.

> And the final kicker: the human brain runs on like two dozen Watts. An LLM takes a year of running on a few MW to train and several KW to run.

I mean, you could argue that if you take into consideration all the generations (starting from the first amoeba) that it took to get to a standard human brain today, then the total energy used to "train" that brain is far greater. But I get your point and I do agree with you that our current hardware paradigm is probably not what's going to give us "god in a box".

Try explaining to someone who's only ever seen dial-up modems that 4k HDR video streaming is a thing.

  • Dial-up modems can transfer a 4K HDR video file, or any other arbitrary data.

    It obviously wouldn't have the bandwidth to do so in a way that would make a real-time stream feasible, but it doesn't involve any leap of logic to conclude that a higher bandwidth link means being able to transfer more data within a given period of time, which would eventually enable use cases that weren't feasible before.

    In contrast, you could throw an essentially unlimited amount of hardware at LLMs, and that still wouldn't mean that they would be able to achieve AGI, because there's no clear mechanism for how they would do so.

    • From modern perspective it's obvious that simply upping the bandwidth allows streaming high-quality videos, but it's not strictly about "more bigger cable". Huge leaps in various technologies were needed for you to watch video in 4k:

      - 4k consumer-grade cameras

      - SSDs

      - video codecs

      - hardware-accelerated video encoding

      - large-scale internet infrastructure

      - OLED displays

      What I'm trying to say is that I clearly remember reading an old article about sharing mp3s on P2P networks and the person writing the article was confident that video sharing, let alone video streaming, let alone high-quality video streaming, wouldn't happen in foreseeable future because there were just too many problems with that.

      If you went back in time just 10 years and told people about ChatGPT they simply wouldn't believe you. They imagined that an AI that can do things that current LLMs can do must be insanely complex, but once technology made that step, we realized "it's actually not that complicated". Sure, AGI won't surface from simply adding more GPUs into LLMs, just like LLMs didn't emerge from adding more GPUs to "cat vs dog" AI. But if technology took us from "AI can tell apart dog and cat 80% of the time" to "AI is literally wiping out entire industry sectors like translation or creative work while turning people into dopamine addicts en masse" within ten years, then I assume that I'll see AGI within my lifetime.

      1 reply →

Exactly why I cringe so hard when AI-bros make arguments equating AI neurons to biological neurons.

  • There are some tradeoffs in the other direction. Digital neurons can have advantages that biological neurons do not.

    For example, if biology had a "choice" I am fairly confident that it would have elected to not have leaky charge carriers or relatively high latency between elements. Roughly 20% of our brain exists simply to slow down and compensate for the other 80%.

    I don't know that eliminating these caveats is sufficient to overcome all the downsides, but I also don't think we've tried very hard to build experiments that directly target this kind of thinking. Most of our digital neurons today are of an extremely reductive variety. At a minimum, I think we need recurrence over a time domain. The current paradigm (GPU-bound) is highly allergic to a causal flow of events over time (i.e., branching control flows).