Comment by alwa
3 months ago
That description feels relatable to me. Maybe buffered more than buttered, in my case ;)
It seems to me that would be a tick in the “pro” column for this idea of using pixels (or contours, a la JPEG) as the models’ fundamental stimulus to train against (as opposed to textual tokens). Isn’t there a comparison to be drawn between the “threads” you describe here, and the multi-headed attention mechanisms (or whatever it is) that the LLM models use to weigh associations at various distances between tokens?
No comments yet
Contribute on Hacker News ↗