mathematics Article A Robust Approach for Identifying the Major Components of the Bribery Tolerance Index Daniel Homocianu 1 , Aurelian-Petrus , Plopeanu 2 and Rodica Ianole-Calin 3, *   Citation: Homocianu, D.; Plopeanu, A.-P.; Ianole-Calin, R. A Robust Approach for Identifying the Major Components of the Bribery Tolerance Index. Mathematics 2021, 9, 1570. https://doi.org/10.3390/math9131570 Academic Editor: David Carfì Received: 14 June 2021 Accepted: 1 July 2021 Published: 3 July 2021 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affil- iations. Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). 1 Department of Accounting, Business Information Systems and Statistics, Faculty of Economics and Business Administration, Alexandru Ioan Cuza University of Iasi, 700505 Ia¸ si, Romania; daniel.homocianu@uaic.ro 2 Humanities and Social Sciences Research Department, Institute of Interdisciplinary Research, Alexandru Ioan Cuza University of Iasi, 700107 Ia¸ si, Romania; aplopeanu@gmail.com 3 Faculty of Administration and Business, University of Bucharest, 030018 Bucharest, Romania * Correspondence: rodica.ianole@faa.unibuc.ro Abstract: The paper aims to emphasize the advantages of several advanced statistical and data mining techniques when applied to the dense literature on corruption measurements and determinants. For this purpose, we used all seven waves of the World Values Survey and we employed the Naive Bayes technique in SQL Server Analysis Services 2016, the LASSO package together with logit and melogit regressions with raw coefficients in Stata 16. We further conducted different types of tests and cross-validations on the wave, country, gender, and age categories. For eliminating multicollinearity, we used predictor correlation matrices. Moreover, we assessed the maximum computed variance inflation factor (VIF) against a maximum acceptable threshold, depending on the model’s R squared in Ordinary Least Square (OLS) regressions. Our main contribution consists of a methodology for exploring and validating the most important predictors of the risk associated with bribery tolerance. We found the significant role of three influences corresponding to questions about attitudes towards the property, authority, and public services, and other people in terms of anti-cheating, anti-evasion, and anti-violence. We used scobit, probit, and logit regressions with average marginal effects to build and test the index based on these attitudes. We successfully tested the index using also risk prediction nomograms and accuracy measurements (AUCROC > 0.9). Keywords: bribery tolerance index; Naive Bayes; LASSO; maximum acceptable VIF; correlation matrices; cross-validations; minimum accuracy loss; mixed-effects; average marginal effects; risk prediction nomograms 1. Introduction The current massive increase in data about people’s attitudes and behaviors raises both opportunities and challenges for economics and social sciences, on different levels [1]. One major area of innovation is reflected in the advanced statistical methodologies used to capture as accurately as possible the most relevant and actionable insights for private and public use [2]. In this spirit, there is a growing tendency to define comprehensive measures which are able to integrate various aspects of individual behaviors or socio-economic phenomena (e.g., development [3], poverty [4], and sustainability [5]). Under this um- brella, the use of composite indices appears as a common practice, with a high degree of heterogeneity concerning the many different computational techniques employed to obtain them. Namely, they vary from additive approaches (e.g., the tax morale index [6]) and ad-hoc selection of variables to more complex procedures, like principal component analy- sis and selection techniques using different correlation coefficients (e.g., the sustainable development index for European economies [7]). As reported by [8], even if there are many available methods for variable selection (ridge or partial least-squares regressions [9]), the least absolute shrinkage and selection operator (LASSO) regression is desirable because it ensures sparsity of coefficients and Mathematics 2021, 9, 1570. https://doi.org/10.3390/math9131570 https://www.mdpi.com/journal/mathematics