← Back to context

Comment by nightski

6 months ago

Are they using the same diffusion models as the GPT-3 area? Meaning is it the LLM that has improved or is it the diffusion model? I know it's probably a foolish take but I am really skeptical of the "larger models will solve all our problems" line of thinking.

They don’t compare in the paper. I will say I experimented extensively with GPT-3 era LLMs on improving ouput by trying to guide early diffusion models with critical responses. It was a) not successful, and b) pretty clear to me that GPT-3 didn’t “get” what it was supposed to be doing, or didn’t have enough context to keep all this in mind, or couldn’t process it properly, or some such thing.

This paper has ablations, although I didn’t read that section, so you could see where they say the effectiveness comes from. I bet you thought that it’s emergent from a bunch of different places.

FWIW, I don’t think LLMS will solve all our problems, so I too am skeptical of that claim. I’m not skeptical of the slightly weaker “larger models have emergent capabilities and we are probably not done finding them as we scale up”.

  • > FWIW, I don’t think LLMS will solve all our problems, so I too am skeptical of that claim. I’m not skeptical of the slightly weaker “larger models have emergent capabilities and we are probably not done finding them as we scale up”.

    100% agree. I'd classify the time now as identifying the limits of what they can functionally do though, an it's a lot!