Comment by ekianjo
24 days ago
It needs specific support, and for example llama.cpp would have support for some of them. But that comes with limitations in how much RAM they can allocate. But when they work, you see a flat CPU usage and the NPU does everything for inference.
No comments yet
Contribute on Hacker News ↗