← Back to context

Comment by martinald

2 days ago

Tbh it's been the same in Windows PCs since forever. Like MMX in the Pentium 1 days - was marketed as basically essential for anything "multimedia" but provided somewhat between no and minimal speedup (v little software was compiled for it).

It's quite similar with Apple's neural engine, which afiak is used very little for LLMs, even for coreML. I know I don't think I ever saw it being used in asitop. And I'm sure whatever was using it (facial recognition?) could have easily ran on GPU with no real efficiency loss.

I have to disagree with you about MMX. It's possible a lot of software didn't target it explicitly but on Windows MMX was very widely used as it was integrated into DirectX, ffmpeg, GDI, the initial MP3 libraries (l3codeca which was used by Winamp and other popular MP3 players) and the popular DIVX video codec.

  • Similar to AI PC's right now, very few consumers cared in late 90s. Majority weren't power users creating/editing videos/audio/graphics. Majority of consumers were just consuming and they never had a need to seek out MMX for that, their main consumption bottleneck was likely bandwidth. If they used MMX indirectly in Winamp or DirectX, they probably had no clue.

    Today, typical consumers aren't even using a ton of AI or enough to even make them think to buy specialized hardware for it. Maybe that changes but it's the current state.

  • MMX had a chicken/egg problem; it did take awhile to "take off" so early adopters really didn't see much from it, but by the time it was commonplace it was doing some work.

  • ffmpeg didn't come out for 4 years after the MMX brand was introduced!

    Of course MMX was widely used later but at the time it was complete marketing.

Apple's neural engine is used a lot by the non-LLM ML tasks all over the system like facial recognition in photos and the like. The point of it isn't to be some beefy AI co-processor but to be a low-power accelerator for background ML workloads.

The same workloads could use the GPU but it's more general purpose and thus uses more power for the same task. The same reason macOS uses hardware acceleration for video codecs and even JPEG, the work could be done on the CPU but cost more in terms of power. Using hardware acceleration helps with the 10+ hour lifetime on the battery.

  • Yes of course but it's basically a waste of silicon (which is very valuable) imo - you save a handful of watts to do very few tasks. I would be surprised if in the length of my MacBook the NPU has been utilised more than 1% of the time the system is being used.

    You still need a GPU regardless if you can do JPEG and h264 decode on the card - for games, animations, etc etc.

    • Do you use Apple's Photos app? Ever see those generated "memories," or search for photos by facial recognition? Where do you think that processing is being done?

      Your macbook's NPU is probably active every moment that your computer is on, and you just didn't know about it.

      3 replies →

Using VisionOCR stuff on MacOS spins my M4 ANE up from 0 to 1W according to poweranalyzer

The silicon is sitting idle in the case of most laptop NPUs. In my experience, embedded NPUs are very efficient, so there's theoretically real gains to be made if the cores were actually used.

  • Yes but you could use the space on die for GPU cores.

    • At least with the embedded platforms I'm familiar with, dedicated silicon to NPU is both faster and more power efficient than offloading to GPU cores.

      If you're going to be doing ML at the edge, NPUs still seem like the most efficient use of die space to me.