Comment by zozbot234

7 hours ago

The relevant constraint when running on a phone is power, not really RAM footprint. Running the tiny E2B/E4B models makes sense, this is essentially what they're designed for.

2 comments

zozbot234

bigyabai 14 minutes ago

Between the GPU, NPU and big.LITTLE cores, many phones have no fewer than 4 different power profiles they can run inference at. It's about as solved as it will get without an architectural overhaul.

trvz 4 hours ago

It absolutely is RAM…

So much so that this was what made Apple increase their base sizes.