← Back to context

Comment by null_point

2 years ago

I'm on the fence with this because it's plausible that some critical component of achieving superintelligence might be discovered more quickly by teams that, say, have sophisticated mechanistic interpretability incorporated into their systems.

A point of evidence in this direction is that RLHF was developed originally as an alignment technique and then it turned out to be a breakthrough that also made LLMs better and more useful. Alignment and capabilities work aren't necessarily at odds with each other.