Comment by jgalt212
6 hours ago
Outside of games and coding generating enough valid examples and counter-examples to harness the power of RL is cost prohibitive.
6 hours ago
Outside of games and coding generating enough valid examples and counter-examples to harness the power of RL is cost prohibitive.
Which is why rubrics as rewards are used.
still cost prohibitive.
Yes, which is why for some things I've gotten paid as much as $1500 per training example generated.
AI labs don't care about cost prohibitive.