← Back to context

Comment by solid_fuel

21 days ago

I disagree. I think targeting running models on end user devices is a good goal, and it's the ideal case for user privacy and latency.

The human brain consumes around 20 watts, while of course there are substantial differences with implementation I think it's reasonable to draw a line and say that eventually we should expect models to hit similar levels of performance per watt. We see some evidence now that small models can achieve high levels of performance with better training techniques, and it's perfectly conceivable that acceptable levels of performance for general use will eventually be baked into models small enough to run on end hardware. And at the speed of development here, "eventually" could mean 1-2 years.