Comment by dannyw

15 hours ago

RLAIF is a good place to start reading.

Claude will also help you with (mostly good advice) if you ask something like “Research and help me make the most effective plan to train a smaller student model to be better from a teacher model”.

I actually was doing an experiment with a GLM->Gemma E4B for fun, and Claude kept on suggesting I should also add Claude Opus as a teacher lol, suggesting techniques I haven’t heard of like thinking inversion (train a small model to deconstruct summarised thinking into detailed native thinking format of the student).

So I can absolutely see and understand the concern around Fable’s frontier LLM development mitigations, but their approach of silently degrading is completely wrong and dangerous.

AI classifiers, like all AI, can make mistakes, and it’d only be a matter of time before it mis-fires and silently sabotaging a university’s HPC cluster for physics simulations or something because the shape looks like DeepSeek or whatnot to a dumb fast classifier.