
What is Explainability & Interpretability?
Interpretability is the degree to which a human can understand the cause of a model's decision. Explainability is the ability to describe a model's decision-making process in human-understandable terms. Together, they address the "black box" problem β the inability to understand why AI systems make the decisions they do.
Why It Matters
When an AI denies a loan, diagnoses a disease, or flags content for removal, stakeholders need to understand why. Regulations (EU AI Act, GDPR's right to explanation) increasingly require explainability for high-risk AI systems. Beyond compliance, explainability builds trust, aids debugging, and helps detect bias.
How It Works
Intrinsically interpretable models:
- Decision trees, linear regression, rule-based systems
- Decisions can be traced through clear logic
- Limited in what they can learn (simpler patterns)
Post-hoc explanation methods (for black-box models):
Feature attribution:
- SHAP (SHapley Additive exPlanations) β calculates each feature's contribution to a prediction using game theory
- LIME (Local Interpretable Model-agnostic Explanations) β approximates the model locally with an interpretable model