← Back to context

Comment by whimsicalism

12 hours ago

easy enough to solve with RL probably

There is no RL for programming languages. Especially ones w/ no significant amount of code.

  • I guess the op was implying that is something fixable fairly easily?

    (Which is true - it's easy to prompt your LLM with the language grammar, have it generate code and then RL on that)

    Easy in the sense of "it is only having enough GPUs to RL a coding capable LLM" anyway.

    • If you can generate code from the grammar then what exactly are you RLing? The point was to generate code in the first place so what does backpropagation get you here?

      1 reply →

  • Go read the DeepSeek R1 paper

    • Why would I do that? If you know something then quote the relevant passage & equation that says you can train code generators w/ RL on a novel language w/ little to no code to train on. More generally, don't ask random people on the internet to do work for you for free.

      1 reply →