Comment by measurablefunc

20 days ago

If you can generate code from the grammar then what exactly are you RLing? The point was to generate code in the first place so what does backpropagation get you here?

Post RL you won't need to put the grammar in the prompt anymore.

  • The grammar of this language is no more than a few hundred tokens (thousands at worst) & current LLMs support context windows in the millions of tokens.

    • Sure.

      The point is that your statement about the ability to do RL is wrong.

      Additionally your response to the Deepseek paper in the other subthread shows profound and deliberate ignorance.

      3 replies →