Comment by dmix

1 month ago

How much size would it take to store a model of every known location in the world and common things?

For ex: I sent a friend a photo of my puppy in the bathtub and her Airpods (via iphone) announced "(name) sent you a photo of a dog in a bathtub". She thought it was really cool and so do I personally. That's a useful feature. IDK how much that requires going off-device though.

> That's a useful feature.

I’m really curious how this feature is considered useful. It’s cool, but can’t you just open the photo to view it?

  • It is a notification summary.

    There is a large number of people out there who receive hundreds of notifications (willingly but mostly unwillingly) daily from apps they have installed (not just messengers), and nearly all of them can't cope with the flood of the notifications. Most give up on tending to the notifications altogether, but some resort to the notification summaries which alleviate the cognitive overload («oh, it is a picture of a dog that I will check out when I get a chance to, and not a pic of significant other who got themselves into a car accident»).

    The solution is, of course, to not allow all random apps to spam the user with the notifications, but that is not how laypeople use their phones.

    Even the more legit apps (e.g. IG) dump complete garbage upon unsuspecting users throughout the day («we thought you would love this piece of trash because somebody paid us»).

  • With an assault of notifications while you are busy, being able to determine without lifting your phone whether the alert is something mundane versus what you've been waiting for, is useful. If I'm working outside, removing gloves to check the phone each time it vibrates becomes endlessly disruptive, but I want the phone with me in case family calls, someone's delivering something, there's an urgent work issue, etc.

I’m not an expert, but I would say extremely small.

For comparison Hunyuan video encodes a shit-ton of videos and rudimentary real world physics understanding, at very high quality in only 13B parameters. LLAMA 3.3 encodes a good chunk of all the knowledge available to humanity in only 70B parameters. And this is only considering open source models, the closed source one may be even more efficient.

  • Maybe we have different understandings of what extremely small is (including that emphasis) but an LLM is not that by definition (the first L). I'm no expert either but the smaller value mentioned is 13e9. If these things are 8-bit integers, that's 13 GB data (more for a normal integer or a float). That's a significant percentage of long term storage on a phone (especially Apple models) let alone that it would fit in RAM on even most desktops which is afaik required for useful speeds. Taking this as upper bound and saying it must be extremely small to encode only landmarks, idk. I'd be impressed if it's down to a few dozen megabytes, but with potentially hundreds of such mildly useful neural nets, it adds up and isn't so small that you'd include it as a no-brainer either