Comment by zephen

1 month ago

I agree 99%.

The 1% where something else is better?

Youtube videos that show you how to access hidden fasteners on things you want to take apart.

Not that I can't get absolutely anything open, but sometimes it's nice to be able to do so with minimal damage.

I wonder if some day there will be a video codec that is essentially a standard distribution of a very precise and extremely fast text-to-video model (like SmartTurboDiffusion-2027 or something). Because surely there are limits to text, but even the example you gave does not seem to me to be beyond the reach of a text description, given a certain level of precision and capability in the model. And we now have faster than realtime text to video.

  • Maybe?

    To the extent that that could work, I would imagine that I, personally, would be happy reading the textual description instead of watching the video, and for me, we'd now be even closer to text wins 100% of the time.

    In other words, it's not that you _can't_ give excellent descriptions that would obviate the need for video, it's just that people _don't_, even, or perhaps even especially, when they think they do.

    If someone writes text that creates a video that shows exactly how to get something apart, then _presumably_ they also watch the video to make sure it works.

    So the video becomes a debugging tool for their instructions. Perhaps not as good as watching 100 people do it, but maybe even better in some ways.

    So the video codec you describe could be a useful tool to help create more programmers.

    https://www.commitstrip.com/en/2016/08/25/a-very-comprehensi...

    • I think it's quite obvious that any textual description that had any hope of being converted to video in this way would be entirely useless for a human mind. It wouldn't say something like "the fastener is on the under side of the chair about 3/5s of the way", it would say somerhing like "there is a square-shaped object in view 5cm from the top of the view and 120cm from the right; the object is 2cm x 2.2cm, color 0x7F325A".

      1 reply →

  • This sounds incredibly precarious and prone to breaking when you update to a new model.

    • It would be impossible to change the model. It would be like a codec, like H.264 but with 1-2GB of fixed data attached to that code name. Changing the model is like going to H.265. Different codec.