Comment by macleginn
18 hours ago
With such learning your model needs to be able to provide some kind of solution or at least approximate it right off the bat. Otherwise it will keep producing random sequences of tokens and will not learn anything ever because there will be nothing in its output to reward, so no guidance.
I don’t agree it needs to provide a solution off the bat. But I do agree there is some initial weights you need to define.
With a AI-first language, I suspect the primitives to be more similar to assembly or WASM rather than something human readable like Rust or Python. So the amount of pre-training preparation would’ve a little easier since syntax errors due to parser constraints.
I’m not suggesting this would be easy though haha. I think it’s a solvable problem but that doesn’t mean it’s easy.