← Back to context

Comment by energy123

3 days ago

Because compression is one of the outcomes of the optimization, it pays to have a single gate/circuit that distinguishes good versus bad, rather than duplicating that abstraction with redundant variants that are almost the same. This is the fundamental reason why that happens. I feel that this has negative implications for AI alignment. It is not robust to defend against a single bit flip. Feels more robust to have a vast heterogeneity of tension that generates the alignment, where misalignment is a matter of degree rather than polar extremes.