Comment by ritvikpandey21

1 year ago

we completely agree - mechanistic interpretability might help keep these language models in check, but it’s going to be very difficult to run this on closed source frontier models. im excited to see where that field progresses

0 comments