Comment by ekianjo

2 months ago

It needs specific support, and for example llama.cpp would have support for some of them. But that comes with limitations in how much RAM they can allocate. But when they work, you see a flat CPU usage and the NPU does everything for inference.

0 comments