Explaining predictive models using Shapley values and non-parametric vine copulas

Kjersti Aas; Thomas Nagler; Martin Jullum; Anders Løland

doi:10.1515/demo-2021-0103

Open Access Published by De Gruyter Open Access June 4, 2021

Explaining predictive models using Shapley values and non-parametric vine copulas

Kjersti Aas , Thomas Nagler , Martin Jullum and Anders Løland

From the journal Dependence Modeling

https://doi.org/10.1515/demo-2021-0103

Abstract

In this paper the goal is to explain predictions from complex machine learning models. One method that has become very popular during the last few years is Shapley values. The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. If the features in reality are dependent this may lead to incorrect explanations. Hence, there have recently been attempts of appropriately modelling/estimating the dependence between the features. Although the previously proposed methods clearly outperform the traditional approach assuming independence, they have their weaknesses. In this paper we propose two new approaches for modelling the dependence between the features. Both approaches are based on vine copulas, which are flexible tools for modelling multivariate non-Gaussian distributions able to characterise a wide range of complex dependencies. The performance of the proposed methods is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula approaches give more accurate approximations to the true Shapley values than their competitors.

Keywords: Prediction explanation; Shapley values; conditional distribution; vine copulas; non-parametric

MSC 2010: 62G05; 62H05; 68T01; 91A12

References

[1] Aas, K., C. Czado, A. Frigessi, and H. Bakken (2009). Pair-copula constructions of multiple dependence. Insurance Math. Econom. 44(2), 182–198.10.1016/j.insmatheco.2007.02.001Search in Google Scholar

[2] Aas, K., M. Jullum, and A. Løland (2021). Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artif. Intell. 298, Article ID 103502, 24 pages.10.1016/j.artint.2021.103502Search in Google Scholar

[3] Bedford, T. and R. M. Cooke (2001). Probability density decomposition for conditionally dependent random variables modeled by vines. Ann. Math. Artif. Intell. 32, 245–268.10.1023/A:1016725902970Search in Google Scholar

[4] Bedford, T. and R. M. Cooke (2002). Vines - a new graphical model for dependent random variables. Ann. Statist. 30(4), 1031–1068.10.1214/aos/1031689016Search in Google Scholar

[5] Bertin, K., C. Lacour, and V. Rivoirard (2016). Adaptive pointwise estimation of conditional density function. Ann. Inst. Henri Poincaré Probab. Stat. 52(2), 939–980.10.1214/14-AIHP665Search in Google Scholar

[6] Chang, B. and H. Joe (2019). Prediction based on conditional distributions of vine copulas. Comput. Statist. Data Anal. 139, 45–63.10.1016/j.csda.2019.04.015Search in Google Scholar

[7] Chen, H., J. D. Janizek, S. Lundberg, and S.-I. Lee (2020). True to the model or true to the data? Proceedings of the 2020 ICML Workshop on Human Interpretability in Machine Learning, pp. 123–129.Search in Google Scholar

[8] Cook, R. D. and M. E. Johnson (1981). A family of distributions for modelling non-elliptically symmetric multivariate data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 43(2), 210–218.10.1111/j.2517-6161.1981.tb01173.xSearch in Google Scholar

[9] Cooke, R. M., H. Joe, and K. Aas (2010). Vines arise. In D. Kurowicka and H. Joe (Eds.), Dependence Modeling, pp. 37–71. World Scientific Publishing, Singapore.10.1142/9789814299886_0003Search in Google Scholar

[10] Cooke, R. M., D. Kurowicka, and K. Wilson (2015). Sampling, conditionalizing, counting, merging, searching regular vines. J. Multivariate Anal. 138, 4–18.10.1016/j.jmva.2015.02.001Search in Google Scholar

[11] Czado C. (2019). Analyzing Dependent Data with Vine Copulas. Springer, Cham.10.1007/978-3-030-13785-4Search in Google Scholar

[12] Fan, J., Q. Yao, and H. Tong (1996). Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika 83(1), 189–206.10.1093/biomet/83.1.189Search in Google Scholar

[13] Grömping, U. (2015). Variable importance in regression models. Wiley Interdiscip. Rev. Comput. Stat. 7(2), 137–152.10.1002/wics.1346Search in Google Scholar

[14] Hobæk Haff, I., K. Aas, A. Frigessi, and V. Lacal (2016). Structure learning in Bayesian networks using regular vines. Comput. Statist. Data Anal. 101, 186–208.10.1016/j.csda.2016.03.003Search in Google Scholar

[15] Holmes, M. P., A. G. Gray, and C. L. Isbell (2010). Fast kernel conditional density estimation: A dual-tree Monte Carlo approach. Comput. Statist. Data Anal. 54(7), 1707–1718.10.1016/j.csda.2010.01.011Search in Google Scholar

[16] Hyndman, R. J., D. M. Bashtannyk, and G. K. Grunwald (1996). Estimating and visualizing conditional densities. J. Comput. Graph. Statist. 5(4), 315–336.Search in Google Scholar

[17] Izbicki, R. and A. B. Lee (2017). Converting high-dimensional regression to high-dimensional conditional density estimation. Electron. J. Statist. 11(2), 2800–2831.10.1214/17-EJS1302Search in Google Scholar

[18] Joe, H. (1996). Families of m-variate distributions with given margins and m(m-1)/2 bivariate dependence parameters. In L. Rüschendorf, B. Schweizer, and M. D. Taylor (Eds.), Distributions with Fixed Marginals and Related Topics, pp. 120–141. Institute of Mathematical Statistics, Hayward CA.10.1214/lnms/1215452614Search in Google Scholar

[19] Johnson, J. W. (2000). A heuristic method for estimating the relative weight of predictor variables in multiple regression. Multivariate Behav. Res. 35(1), 1–19.10.1207/S15327906MBR3501_1Search in Google Scholar PubMed

[20] Kurowicka, D. and R. M. Cooke (2005). Distribution-free continuous Bayesian belief nets. In A. Wilson, N. Limnios, S. Keller-McNulty, and Y. Armijo (Eds.), Modern Statistical and Mathematical Methods in Reliability, pp. 309–322. World Scientific Publishing, Singapore.10.1142/9789812703378_0022Search in Google Scholar

[21] Kurowicka, D. and R. M. Cooke (2006). Uncertainty Analysis with High Dimensional Dependence Modelling. John Wiley & Sons, Chichester.10.1002/0470863072Search in Google Scholar

[22] Lipovetsky, S. and M. Conklin (2001). Analysis of regression in game theory approach. Appl. Stoch. Models Bus. Ind. 17(4), 319–330.10.1002/asmb.446Search in Google Scholar

[23] Lundberg, S. M. and S.-I. Lee (2017). A unified approach to interpreting model predictions. In I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Advances in Neural Information Processing Systems, pp. 4765–4774. Curran Associates, Red Hook NY.Search in Google Scholar

[24] Nagler, T. and C. Czado (2016). Evading the curse of dimensionality in nonparametric density estimation with simplified vine copulas. J. Multivariate Anal. 151, 69–89.10.1016/j.jmva.2016.07.003Search in Google Scholar

[25] Nagler, T. and T. Vatter (2020). Solving estimating equations with copulas. Available at https://arxiv.org/abs/1801.10576.Search in Google Scholar

[26] Nagler, T. and T. Vatter (2021). rvinecopulib: High Performance Algorithms for Vine Copula Modeling. R package version 0.5.5.1.1. Available on CRAN.Search in Google Scholar

[27] Nguyen, M.-L. J. (2018). Nonparametric method for space conditional density estimation in moderately large dimensions. Available at https://arxiv.org/abs/1801.06477.Search in Google Scholar

[28] Otneim, H. and D. Tjøstheim (2018). Conditional density estimation using the local Gaussian correlation. Statist. Comput. 28, 303–321.10.1007/s11222-017-9732-zSearch in Google Scholar

[29] Owen, A. B. and C. Prieur (2017). On Shapley value for measuring importance of dependent inputs. SIAM/ASA J. Uncertain. Quantif. 5(1), 986–1002.10.1137/16M1097717Search in Google Scholar

[30] Panagiotelis, A., C. Czado, and H. Joe (2012). Pair copula constructions for multivariate discrete data. J. Amer. Statist. Assoc. 107(499), 1063–1072.10.1080/01621459.2012.682850Search in Google Scholar

[31] Rosenblatt, M. (1952). Remarks on a multivariate transformation. Ann. Math. Statist. 23(3), 470–472.10.1214/aoms/1177729394Search in Google Scholar

[32] Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27(3), 832–837.10.1214/aoms/1177728190Search in Google Scholar

[33] Sahin, E., C. J. Saul, E. Ozsarfati, and A. Yilmaz (2018). Abalone life phase classification with deep learning. In S. Deb, T. Hanne, and K.-C. Wong (Eds.), Proceedings of the 5th International Conference on Soft Computing & Machine Intelligence, pp. 163–167. Curran Associates, Red Hook NY.10.1109/ISCMI.2018.8703232Search in Google Scholar

[34] Schittenkopf C., G. Dorffner, and E. J. Dockner (2000). Forecasting time-dependent conditional densities: a semi-nonparametric neural network approach. J. Forecast. 19(4), 355–374.10.1002/1099-131X(200007)19:4<355::AID-FOR778>3.0.CO;2-ZSearch in Google Scholar

[35] Sellereite, N. and M. Jullum (2020). shapr: An R-package for explaining machine learning models with dependence-aware Shapley values. J. Open Source Softw. 5(46), Article ID 2027, 3 pages.Search in Google Scholar

[36] Shapley, L. S. (1953). A value for n-person games. In H.W. Kuhn and A.W. Tucker (Eds.), Contributions to the Theory of Games, pp. 307–317. Princeton University Press.10.1515/9781400881970-018Search in Google Scholar

[37] Smith, J. S., B. Wu, and B. M. Wilamowski (2019). Neural network training with Levenberg-Marquardt and adaptable weight compression. IEEE Trans. Neural Netw. Learn. Syst. 30(2), 580–587.10.1109/TNNLS.2018.2846775Search in Google Scholar

[38] Song, E., B. L. Nelson, and J. Staum (2016). Shapley effects for global sensitivity analysis: Theory and computation. SIAM/ASA J. Uncertain. Quantif. 4(1), 1060–1083.10.1137/15M1048070Search in Google Scholar

[39] Stöber, J., H. G. Hong, C. Czado, and P. Ghosh (2015). Comorbidity of chronic diseases in the elderly: Patterns identified by a copula design for mixed responses. Comput. Statist. Data Anal. 88, 28–39.10.1016/j.csda.2015.02.001Search in Google Scholar

[40] Takahasi, K. (1965). Note on the multivariate Burr’s distribution. Ann. Inst. Statist. Math. 17, 257–260.10.1007/BF02868169Search in Google Scholar

[41] Štrumbelj, E. and I. Kononenko (2010). An eflcient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11(1), 1–18.Search in Google Scholar

[42] Štrumbelj, E. and I. Kononenko (2014). Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41, 647–665.10.1007/s10115-013-0679-xSearch in Google Scholar

[43] Wright, M. N. and A. Ziegler (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 77(1), 1–17.10.18637/jss.v077.i01Search in Google Scholar

[44] Yari, G. and A. M. D. Jafari (2006). Information and covariance matrices for multivariate Pareto (IV), Burr, and related distributions. Int. J. Eng. Sci. 17(3-4), 61–69.Search in Google Scholar

Received: 2021-02-08

Accepted: 2021-04-30

Published Online: 2021-06-04

This work is licensed under the Creative Commons Attribution 4.0 International License.

Explaining predictive models using Shapley values and non-parametric vine copulas

Abstract

References

Journal and Issue

Articles in the same Issue