Comment by jiggawatts

2 years ago

There's a rumour that GPT-4 runs every query either 8x or 16x in parallel, and then picks the "best" answer using an additional AI that is trained for that purpose.

8 comments

jiggawatts

mewpmewp2 2 years ago

It would have to pick each token then, no? Because you can get a streaming response, which would completely invalidate the idea of the answer being picked after.

refulgentis 2 years ago
It's false, it's the 9 months-down-the-line telephone game of a unsourced rumor re: mixture of experts model. Drives me absolutely crazy.
Extended musings on it, please ignore unless curious about evolution patterns of memes:
Funnily enough, it's gotten _easier_ to talk about over time -- i.e. on day 1 you can't criticize it because it's "just a rumor, how do you know?" -- on day 100 it's even worse because that effect hasn't subsided much, and it spread like wildfire.
On day 270, the same thing that gave it genetic fitness, the alluring simplicity of "ah yes, there's 8x going on", has become the core and only feature of the Nth round of the telephone game. There's no more big expert-sounding words around it that make it seem plausible.
- FuckButtons 2 years ago
  
  As with most zombie theories, it exists because there is a vacuum of evidence to the contrary, not because it’s true.
- theendisney2 2 years ago
  
  It thinks about your question forever...
- jacquesm 2 years ago
  
  That genetic fitness is exactly why those stupid conspiracy theories refuse to die out: they've adapted to their hosts.

verdverm 2 years ago

I recall reading something about it being a MoE (mixture of experts) which would align with what you are saying

Liebert_v2 2 years ago

That do makes sense if you consider the MIT paper on debating LLMs.

theGnuMe 2 years ago

So beam search?