← Back to context

Comment by zozbot234

7 hours ago

The relevant constraint when running on a phone is power, not really RAM footprint. Running the tiny E2B/E4B models makes sense, this is essentially what they're designed for.

Between the GPU, NPU and big.LITTLE cores, many phones have no fewer than 4 different power profiles they can run inference at. It's about as solved as it will get without an architectural overhaul.

It absolutely is RAM…

So much so that this was what made Apple increase their base sizes.