← Back to context

Comment by bigyabai

6 days ago

It would be interesting to see the tok/s comparison between the ANE and GPU for inference. I bet these small models are a lot friendlier than the 7B/12B models that technically fit on a phone but won't accelerate well without a GPU.

I thought the big difference between the GPU and ANE was that you couldn't use the ANE to train. Does the GPU actually perform faster during inference as well? Is that because the ANE are designed more for efficiency or is there another bigger reason?

  • GPUs are usually faster for inference simply because they have more ALUs/FPUs but they are also less efficient.