Hi everyone, is anyone working in the AI interpretability/explainable AI space? I am interested in learning more about the problem of AI interpretability and how current AI regulations rely on this concept that is in its infancy, and how regulations will affect AI development and deployment.
Two questions come to mind; first, could interpretability ever be achievable for an AI system? Second, assuming we can accomplish the first, can interpretability be advanced enough to solve alignment issues and provide safe regulations to control AI from acting in unpredictable and potentially malicious ways? Any relevant research in this area would be much appreciated. I am happy to share my thoughts and ideas. Thank you!