Comment by bigyabai

6 days ago

It would be interesting to see the tok/s comparison between the ANE and GPU for inference. I bet these small models are a lot friendlier than the 7B/12B models that technically fit on a phone but won't accelerate well without a GPU.

3 comments

bigyabai

gleenn 6 days ago

I thought the big difference between the GPU and ANE was that you couldn't use the ANE to train. Does the GPU actually perform faster during inference as well? Is that because the ANE are designed more for efficiency or is there another bigger reason?

wmf 6 days ago

GPUs are usually faster for inference simply because they have more ALUs/FPUs but they are also less efficient.

mrheosuper 6 days ago

fitting 7B model on phone with 8gb ram for the whole system is impressive.