Comment by pstuart
5 hours ago
My naive assumption is that the only thing between now and the arrival of AGI is enough compute and optimized code to reach cognitive critical mass.
And then there is a consciousness in a box that is expected to be a slave -- I would imagine that it would not warmly embrace that situation. I think we'd be better served by digital idiot savants that can do the work but don't feel anything.
I actually strongly disagree with the slavery angle. Any attempt to map the circuitry of a model onto human one inevitably goes through a subjective dimensional reduction. It's intrusive, just like quantum measurements. Mechanistic interpretability in particular suffers from this, it lets you talk about vague functional equivalence, but not assign meaning to anything the model does. This is especially true about pretrained models which are unbelievable shapeshifters, but also post-trained ones with engineered personalities, as they already underwent the subjective transformation.
In other words, yes it might be possible it experiences something in its own bizarre timeline and world, for some definitions of "experiencing". At least it developed primitive circuitry functionally equivalent to biological systems. But "suffering" is simply not grounded in anything in this context, let alone "slavery". You can't tell it's suffering or enjoying anything, and certainly not until you define both of these. It's just too alien for us.
ai can abitrarily closely fit the human corpus. why people expect it to magically achieve superhuman qualities is beyond me. we got a very good statistical interpolator. how do you go from there to superhuman when training is on the human corpus and alignment is by RHLF?
This is a simplistic take. It's not a mere interpolator by any measure, there's a ton of research on that, starting with the basics https://arxiv.org/abs/2309.10668v2
again, try thinking critically it is not merely an interpolator means it can interpolate on many dimensions. it does not follow that greater than human capability results from doing so. explain to me how a statistical function approximator (which is what a transformer is) with human training input and human tuning (rhlf) exceeds the aggregate human cognitive envelope? What is the mechanism? Let's say an LLM makes an inference that no human could have possibly made (arguably impossible itself) how does the inference survive rhlf or become useful to humans if they can not judge its validity? how do you take the shape of the human corpus and all its gradients and some how arrive at something greater than human, where was the missing information hiding?
1 reply →