Comment by dawnofdusk

2 days ago

In actor/critic the actor and critic are normally learned, i.e., their weights are adjusted during the process. The paper is correct that their method is zero-shot, but it doesn't mention that their method is essentially equivalent to a few rounds of training but then discarding the training update.

Anyone who works with deep architectures and momentum-based optimizers knows that the first few updates alone provide large improvements in loss. In this paper the breakthrough is that computing these first few updates at test time enables one to describe the algorithm as "without training" and therefore attract hype.

> discarding the training update

But they aren't updating the model weights. They're iteratively updating the prompt. It's automating the process that humans use with generative models.

Agreed that it's conceptually equivalent though.