Comment by disfictional

1 month ago

As someone who spent a year writing an SDK specifically for AI PCs, it always felt like a solution in search of a problem. Like watching dancers in bunny suits sell CPUs, if the consumer doesn't know the pain point you're fixing, they won't buy your product.

17 comments

disfictional

martinald 1 month ago

Tbh it's been the same in Windows PCs since forever. Like MMX in the Pentium 1 days - was marketed as basically essential for anything "multimedia" but provided somewhat between no and minimal speedup (v little software was compiled for it).

It's quite similar with Apple's neural engine, which afiak is used very little for LLMs, even for coreML. I know I don't think I ever saw it being used in asitop. And I'm sure whatever was using it (facial recognition?) could have easily ran on GPU with no real efficiency loss.

giantrobot 1 month ago
Apple's neural engine is used a lot by the non-LLM ML tasks all over the system like facial recognition in photos and the like. The point of it isn't to be some beefy AI co-processor but to be a low-power accelerator for background ML workloads.
The same workloads could use the GPU but it's more general purpose and thus uses more power for the same task. The same reason macOS uses hardware acceleration for video codecs and even JPEG, the work could be done on the CPU but cost more in terms of power. Using hardware acceleration helps with the 10+ hour lifetime on the battery.
- martinald 1 month ago
  
  Yes of course but it's basically a waste of silicon (which is very valuable) imo - you save a handful of watts to do very few tasks. I would be surprised if in the length of my MacBook the NPU has been utilised more than 1% of the time the system is being used.
  You still need a GPU regardless if you can do JPEG and h264 decode on the card - for games, animations, etc etc.
  
  4 replies →
Maxatar 1 month ago
I have to disagree with you about MMX. It's possible a lot of software didn't target it explicitly but on Windows MMX was very widely used as it was integrated into DirectX, ffmpeg, GDI, the initial MP3 libraries (l3codeca which was used by Winamp and other popular MP3 players) and the popular DIVX video codec.
- conductr 1 month ago
  
  Similar to AI PC's right now, very few consumers cared in late 90s. Majority weren't power users creating/editing videos/audio/graphics. Majority of consumers were just consuming and they never had a need to seek out MMX for that, their main consumption bottleneck was likely bandwidth. If they used MMX indirectly in Winamp or DirectX, they probably had no clue.
  Today, typical consumers aren't even using a ton of AI or enough to even make them think to buy specialized hardware for it. Maybe that changes but it's the current state.
- bombcar 1 month ago
  
  MMX had a chicken/egg problem; it did take awhile to "take off" so early adopters really didn't see much from it, but by the time it was commonplace it was doing some work.
- martinald 1 month ago
  
  ffmpeg didn't come out for 4 years after the MMX brand was introduced!
  Of course MMX was widely used later but at the time it was complete marketing.
buildbot 1 month ago

Using VisionOCR stuff on MacOS spins my M4 ANE up from 0 to 1W according to poweranalyzer
heavyset_go 1 month ago
The silicon is sitting idle in the case of most laptop NPUs. In my experience, embedded NPUs are very efficient, so there's theoretically real gains to be made if the cores were actually used.
- martinald 1 month ago
  
  Yes but you could use the space on die for GPU cores.
  
  1 reply →

ezst 1 month ago

It's even worse and sadder. Consumers already paid a premium for that, because the monopolists in place made it unavoidable. And now, years later, engineers (who usually are your best advocates and evangelists when it comes to bringing new technologies to the material world) are desperate to find any reason at all for those things to exist and not be a complete waste of money and resources.

convivialdingo 1 month ago

I spent a few months working on different edge compute NPUs (ARM mostly) with CNN models and it was really painful. A lot of impressive hardware, but I was always running into software fallbacks for models, custom half-baked NN formats, random caveats, and bad quantization.

In the end it was faster, cheaper, and more reliable to buy a fat server running our models and pay the bandwidth tax.