Accuracy of explanations of machine learning models for credit decisions
Authors
Issue Date
23-Jun-2022
Physical description
45 p.
Abstract
Uno de los principales retos en el uso de modelos de aprendizaje automático, o machine learning en inglés (ML), en finanzas es cómo explicar sus resultados. Recientemente han aparecido técnicas de interpretabilidad con este objetivo, pero existe discusión sobre su fiabilidad. En este documento contribuimos al debate proponiendo una metodología para evaluar la precisión de estas técnicas de interpretabilidad. Partimos de la generación de conjuntos de datos sintéticos, siguiendo un enfoque que nos permite controlar la importancia de cada variable explicativa (feature) en nuestra variable objetivo. Al definir nosotros la importancia de las features, podemos posteriormente calcular en qué medida las explicaciones dadas por las técnicas de interpretabilidad coinciden con la verdad subyacente. Por lo tanto, si en nuestro conjunto de datos sintéticos definimos una feature como relevante para la variable objetivo, la técnica de interpretabilidad también debería identificarla como una feature relevante. Desarrollamos un ejemplo empírico en el que generamos conjuntos de datos sintéticos de manera que se parezcan a datos de suscripción y calificación crediticia, donde la variable objetivo es una variable binaria que representa el incumplimiento del solicitante. Usamos modelos de ML no interpretables, como redes neuronales, para predecir el incumplimiento, y luego explicamos sus resultados usando dos técnicas populares de interpretabilidad, SHAP y permutation Feature Importance (FI). Nuestros resultados usando la metodología propuesta sugieren que SHAP identifica mejor las variables relevantes como tales, aunque los resultados pueden variar significativamente según las características del conjunto de datos y el modelo ML utilizado. Concluimos que el recurso a la generación sintética de bases de datos muestra un elevado potencial para supervisores y entidades financieras que precisen evaluar la fidelidad de estas técnicas.
One of the biggest challenges for the application of machine learning (ML) models in finance is how to explain their results. In recent years, different interpretability techniques have appeared to assist in this task, although their usefulness is still a matter of debate. In this article we contribute to the debate by creating a framework to assess the accuracy of these interpretability techniques. We start from the generation of synthetic data sets, following an approach that allows us to control the importance of each explanatory variable (feature) in our target variable. By defining the importance of features ourselves, we can then calculate to what extent the explanations given by the interpretability techniques match the underlying truth. Therefore, if in our synthetic dataset we define a feature as relevant to the target variable, the interpretability technique should also identify it as a relevant feature. We run an empirical example in which we generate synthetic datasets intended to resemble underwriting and credit rating datasets, where the target variable is a binary variable representing applicant default. We then use non-interpretable ML models, such as deep learning, to predict default, and then explain their results using two popular interpretability techniques, SHAP and permutation Feature Importance (FI). Our results using the proposed framework suggest that SHAP is better at interpreting relevant features as such, although the results may vary significantly depending on the characteristics of the dataset and the ML model used. We conclude that generating synthetic datasets shows potential as a useful approach for supervisors and practitioners looking for solutions to assess the interpretability tools available for ML models in the financial sector.
One of the biggest challenges for the application of machine learning (ML) models in finance is how to explain their results. In recent years, different interpretability techniques have appeared to assist in this task, although their usefulness is still a matter of debate. In this article we contribute to the debate by creating a framework to assess the accuracy of these interpretability techniques. We start from the generation of synthetic data sets, following an approach that allows us to control the importance of each explanatory variable (feature) in our target variable. By defining the importance of features ourselves, we can then calculate to what extent the explanations given by the interpretability techniques match the underlying truth. Therefore, if in our synthetic dataset we define a feature as relevant to the target variable, the interpretability technique should also identify it as a relevant feature. We run an empirical example in which we generate synthetic datasets intended to resemble underwriting and credit rating datasets, where the target variable is a binary variable representing applicant default. We then use non-interpretable ML models, such as deep learning, to predict default, and then explain their results using two popular interpretability techniques, SHAP and permutation Feature Importance (FI). Our results using the proposed framework suggest that SHAP is better at interpreting relevant features as such, although the results may vary significantly depending on the characteristics of the dataset and the ML model used. We conclude that generating synthetic datasets shows potential as a useful approach for supervisors and practitioners looking for solutions to assess the interpretability tools available for ML models in the financial sector.
Publish on
Documentos de Trabajo / Banco de España, 2222
Subjects
Datos sintéticos; Inteligencia artificial; Interpretabilidad; Aprendizaje automático; Evaluación de crédito; Synthetic datasets; Artificial intelligence; Interpretability; Machine learning; Credit assessment; Créditos; Modelización econométrica; Técnicas informáticas
Appears in Collections: