Comment by sillysaurusx

3 months ago

It’s been said that RL is the worst way to train a model, except for all the others. Many prominent scientists seem to doubt that this is how we’ll be training cutting edge models in a decade. I agree, and I encourage you to try to think of alternative paradigms as you go through this course.

If that seems unlikely, remember that image generation didn’t take off till diffusion models, and GPTs didn’t take off till RLHF. If you’ve been around long enough it’ll seem obvious that this isn’t the final step. The challenge for you is, find the one that’s better.

20 comments

sillysaurusx

PaulRobinson 3 months ago

You're assuming that people are only interested in image and text generation.

RL excels at learning control problems. It is mathematically guaranteed to provide an optimal solution for the state and controls you provide it, given enough runtime. For some problems (playing computer games), that runtime is surprisingly short.

There is a reason self-driving cars use RL, and don't use GPTs.

bchasknga 3 months ago
> self-driving cars use RL
Some part of it, but I would argue with a lot of guardrail in place and not as common as you think. I don't think the majority of the planner/control stack out there in SDC is based. I also don't think any production SDCs are RL-based.
- rangestransform 3 months ago
  
  Based on the zoox iccv talk, it sounds like their main planner is RL.
noobcoder 3 months ago

I have been using it to train it on my game hotlapdaily
Apparently AI sets the best time even better than the pros It is really useful when it comes to controlled environment optimizations
srean 3 months ago

You are exactly right.
Control theory and reinforcement learning are different ways of looking at the same problem. They traditionally and culturally focussed on different aspects.

whatshisface 3 months ago

RL is barely even a training method, its more of a dataset generation method.

theOGognf 3 months ago
I feel like both this comment and the parent comment highlight how RL has been going through a cycle of misunderstanding recently from another one of its popularity booms due to being used to train LLMs
- mistercheph 3 months ago
  
  care to correct the misunderstanding?
  
  1 reply →
- phyalow 3 months ago
  
  Its reductive, but also roughly correct.
  
  1 reply →

poorman 3 months ago

RL is still widely used in the advertising industry. Don't let anyone tell you otherwise. When you have millions to billions of visits and you are trying to optimize an outcome RL is very good at that. Add in context with contextual multi-armed bandits and you have something very good at driving people towards purchasing.

paswut 3 months ago

What about for combinatorial optimization? When you have a simulation of the world what other paradigms are fitting

whatever1 3 months ago
More likely we will develop general super intelligent AI before we (together with our super intelligent friends) solve the problem of combinatorial optimization.
- hyperbovine 3 months ago
  
  There's nothing to solve. The CoD kills you no matter what. P=NP or maybe quantum computing is the only hope of making serious progress on large-scale combinatorial optimization.

rishabhaiover 3 months ago

I like to think of RLHF as a technique that I, as a student, used to apply to score good marks in my exam. As soon as I started working, I realized that out-of-distribution generalization can't be only achieved from practicing in an environment with verifiable rewards.

charcircuit 3 months ago

GPT wouldn't have even been possible, let alone take off, without self supervised learning.

mountainriver 3 months ago
RLHF is what gave us the ChatGPT moment. Self supervised learning was the base for this.
SSL creates all the connections and RL learns to walk the paths
- charcircuit 3 months ago
  
  The easy to use web interface gave us the ChatGPT moment. Take a look at AI Dungeon for GPT2. It went viral due to making using GPT2 accessible.
  
  1 reply →