← Back to context

Comment by red75prime

11 hours ago

The obvious fix is to make interpretation of itself a part of the model (like we can explicitly introspect to a certain extent what the brain is doing). Misinterpretation of itself, hopefully, would decrease the system's performance on all tasks and it would be rooted out by training. Of course, it doesn't mean that the fix is easy to implement and that it doesn't have other failure modes.