Bayesian Networks on Income Tax Audit Selection - A Case Study of Brazilian Tax Administration Leon S ´ olon da Silva * Secretariat of Federal Revenue of Brazil Universidade de Bras´ ılia leon.silva@rfb.gov.br Henrique de C. Rigitano † Secretariat of Federal Revenue of Brazil henrique.rigitano@rfb.gov.br Rommel N. Carvalho Brazil’s Office of the Comptroller General ‡ Universidade de Bras´ ılia § rommel.carvalho@cgu.gov.br Jo˜ ao Carlos F. Souza ¶ Universidade de Bras´ ılia jocafs@unb.br Abstract Tax administrations in most countries have more corporate and personal information than any other government office. Data mining techniques can be used in many different prob- lems due to the large amount of tax returns re- ceived every year. In the present work we show an essay of the Brazilian Tax Administration on using Bayesian networks to predict taxpay- ers behavior based on historical analysis of in- come tax compliance. More specifically, we tried to improve a previous risk based audit se- lection which detects a large amount of taxpay- ers as high risk. However, in its current form it identifies much more cases than the tax audi- tors can handle. Our first results are promising, considerably improving tax audit performance. 1 INTRODUCTION Tax administrations have more information on people and companies than any other government office. Tax re- turns, bank transactions, and invoices arrive as hundreds of millions of records every year. The Secretariat of Fed- eral Revenue of Brazil (RFB) is the Brazilian Tax Ad- ministration and Brazilian Customs as well. This combi- nation is a major leverage and also a challenge. Basically, there are two types of taxes: sales taxes and in- come taxes. Sales taxes includes value-added taxes and they are based on the value of the product being sold. In- come tax is based on how much a person or a company * Anexo Minist´ erio da Defesa, 5o andar Bras´ ılia, DF, Brazil † Av. Rogerio Weber, 1752 - Centro, Porto Velho, RO, Brazil ‡ SAS, Quadra 01, Bloco A, Edificio Darcy Ribeiro Brasilia, DF, Brazil § Campus Darcy Ribeiro Brasilia, DF, Brazil ¶ Campus Darcy Ribeiro Brasilia, DF, Brazil earns. In most countries, sales taxes amount are consid- erably larger than income taxes (OECD, 2013). In Brazil, corporate and personal income taxes are about 50% of the country’s revenue (RFB, 2016). Although corporate tax has much greater impact on final numbers, personal income tax audits affects a considerably large share of the Brazilian citizens. There are 27 million individual taxpayers in Brazil, about 13% of the population (RFB, 2016). In order to facilitate and prioritize tax audits on personal income tax, RFB created the concept of a “fiscal lattice”. One can understand the fiscal lattice as a first audit se- lection based on historical risk analysis of tax compli- ance by taxpayers. This lattice is a complex process in which many tax auditors specialized in personal income tax frauds create risk based rules for audit selection. The main difference between a regular audit and fiscal lattice audit is that the former has a much simpler process of analysis in order to determine whether to punish a tax- payer or not. Since the number of taxpayers has increased, and the ra- tio between tax auditors and citizens has been reducing (RFB, 2016), the number of income taxpayers caught on fiscal lattice has increased as well. From 2010 to 2014, the taxpayers selected for this kind of audit highly in- creased (RFB, 2016). This changing scenario is pushing the tax administration to a limit of the tax auditors ca- pacity of analysis. RFB’s major office, has about 10,000 tax auditors and a huge backlog of fiscal lattice audits to analyze. Data mining techniques can help better selecting taxpay- ers for audit and the present work offers one solution to improve the selection of this kind of audits. In Sec- tion 2.1 we discuss how Bayesian networks can be used as a classification algorithm in order to create predictive models. The document is organized as follows: Section 2 de- scribes some background information about Bayesian BMAW 2016 - Page 14 of 59