Comment by jcelerier

6 days ago

> That kind of physical intuition isn’t something adults obtain after years of education—young children develop this intuition by observing the world around them before they can even speak in full sentences.

I mean, it still takes them much more time than it takes to train even the largest LLMs we use (a couple months)

10 comments

jcelerier

dist-epoch 6 days ago

In wall clock time. If you count in input tokens/pixels, humans learn with orders of magnitude less input data.

naasking 5 days ago

Humans do not start as blank models, they have billions of years of pretraining from evolution.
logicchains 6 days ago
That's not true at all; the amount of audiovisual data a human is exposed to in even just one year is incredibly vast. Over sixty frames per second, sixteen hours per day gives over a trillion frames per year, and each frame at such a high resolution would be hundreds of tokens.
- dist-epoch 6 days ago
  
  Let's take your numbers:
  Human: 1000 tok * 60 * 86400 * 365 = 2 Trillion tokens / year
  GPT-4: 13 Trillion tokens
  Llama-3: 15 Trillion tokens
  
  1 reply →
- cluckindan 6 days ago
  
  That’s why we tokenize very early in the vision pipeline.
  Related: https://en.wikipedia.org/wiki/Form_constant
- Vetch 6 days ago
  
  This contains a common misstep (or misgeneralization of an analogy) among those who are much more familiar with computers than with the brain. The brain is not digital and concepts like frames per second and resolution don't make much sense for vision. First, there aren't frames, neuron activity is asynchronous with changes to sensory neuron firing rate responding to changes in the environment or according to saliency.
  Between the non-uniformity of receptor density (eg fovea vs peripheral vision but this is general across all senses), dynamic receptor fields and the fact that information is encoded in terms of spike rate and timing patterns across neural populations, the idea of pixels in some bitmap at some resolution is beyond misleading. There is no pixel data, just sparsely coded feature representations capturing things like edges, textures, motion, color contrast and the like, already, at the retina.
  While hundreds of trillions of photons might hit our photoreceptors, > 99% of that is filtered and or compressed before even reaching retinal ganglion cells. Only a tiny fraction, about 10 million bits/sec, of the original photon signal rate is transferred through the optic nerve (per eye). This pattern of filtering and attentive prioritization of information in signals continues as we go from sensory fields to thalamus to higher cortical areas.
  So while we might encounter factoids like: on the order of a billion bits per second of data hit photoreceptors or [10Mb/s transferred](https://www.britannica.com/science/information-theory/Physio...) along optic nerves, it's important to keep in mind that a lot of the intuition gained from digital information processing does not transfer in any meaningful sense to the brain.
  
  1 reply →
- iamnotagenius 6 days ago
  
  [dead]

lukan 6 days ago

But they use way less energy for it.