Comment by nl

1 month ago

I guess the op was implying that is something fixable fairly easily?

(Which is true - it's easy to prompt your LLM with the language grammar, have it generate code and then RL on that)

Easy in the sense of "it is only having enough GPUs to RL a coding capable LLM" anyway.

7 comments

If you can generate code from the grammar then what exactly are you RLing? The point was to generate code in the first place so what does backpropagation get you here?

nl 1 month ago
Post RL you won't need to put the grammar in the prompt anymore.
- measurablefunc 1 month ago
  
  The grammar of this language is no more than a few hundred tokens (thousands at worst) & current LLMs support context windows in the millions of tokens.
  
  4 replies →