Comment by nycdatasci

3 months ago

You seem to be suggesting that current frontier models are only trained on text and not "sensor data". Multi-modal models are trained on the entire internet + vast amounts of synthetic data. Images and videos are key inputs. Camera sensors are capable of capturing much more "sensor data" than the human eye. Neural networks are the worst way to model intelligence, except all other models.

You may find this talk enlightening: https://simons.berkeley.edu/talks/ilya-sutskever-openai-2023...

> You seem to be suggesting

As soon as you start a response like that you should just stop. After all, this is written communication, and what I wrote is plain to see right there.

When you need to start a response that way you should become self-aware that you are not responding to what the person you respond to wrote, but to your own ideas.

There is no need to "interpret" what other people wrote.

Relevant: https://i.imgur.com/Izrqp7d.jpeg