Comment by refulgentis

2 years ago

I worked at Google up through 8 weeks ago and knew there _had_ to be a trick --

You know those stats they're quoting for beating GPT-4 and humans? (both are barely beaten)

They're doing K = 32 chain of thought. That means running an _entire self-talk conversation 32 times_.

Source: https://storage.googleapis.com/deepmind-media/gemini/gemini_..., section 5.1.1 paragraph 2

12 comments

refulgentis

How do you know GPT-4 is 1 shot? The details about it aren't released, it is entirely possible it does stuff in multiple stages. Why wouldn't OpenAI use their most powerful version to get better stats, especially when they don't say how they got it?

Google being more open here about what they do is in their favor.

jiggawatts 2 years ago
There's a rumour that GPT-4 runs every query either 8x or 16x in parallel, and then picks the "best" answer using an additional AI that is trained for that purpose.
- mewpmewp2 2 years ago
  
  It would have to pick each token then, no? Because you can get a streaming response, which would completely invalidate the idea of the answer being picked after.
  
  4 replies →
- verdverm 2 years ago
  
  I recall reading something about it being a MoE (mixture of experts) which would align with what you are saying
- Liebert_v2 2 years ago
  
  That do makes sense if you consider the MIT paper on debating LLMs.
- theGnuMe 2 years ago
  
  So beam search?
refulgentis 2 years ago

Same way I know the latest BMW isn't running on a lil nuke reactor. I don't, technically. But there's not enough comment room for me to write out the 1000 things that clearly indicate it. It's a "not even wrong" question on your part

kkkkkkk 2 years ago

where are you seeing that 32-shot vs 1-shot comparison drawn? in the pdf you linked it seems like they run it various times using the same technique on both models and just pick the technique which gemini most wins using.