Citation: Dias, D.; Silva, J.S.; Bernardino, A. The Prediction of Road-Accident Risk through Data Mining: A Case Study from Setubal, Portugal. Informatics 2023, 10, 17. https://doi.org/10.3390/ informatics10010017 Academic Editor: Olga Kurasova Received: 31 December 2022 Revised: 22 January 2023 Accepted: 25 January 2023 Published: 30 January 2023 Copyright: © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). informatics Article The Prediction of Road-Accident Risk through Data Mining: A Case Study from Setubal, Portugal David Dias 1,2 , José Silvestre Silva 1,3,4, * and Alexandre Bernardino 2,5 1 Portuguese Military Academy, Rua Gomes Freire, 1169-203 Lisbon, Portugal 2 Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal 3 Military Academy Research Center (CINAMIL), Rua Gomes Freire, 1169-203 Lisbon, Portugal 4 Laboratory for Instrumentation, Biomedical Engineering and Radiation Physics (LIBPhys-UC), 3000-370 Coimbra, Portugal 5 Institute for Systems and Robotics (ISR/IST), 1049-001 Lisbon, Portugal * Correspondence: jose.silva@academiamilitar.pt Abstract: This work proposes a tool to predict the risk of road accidents. The developed system consists of three steps: data selection and collection, preprocessing, and the use of mining algorithms. The data were imported from the Portuguese National Guard database, and they related to accidents that occurred from 2019 to 2021. The results allowed us to conclude that the highest concentration of accidents occurs during the time interval from 17:00 to 20:00, and that rain is the meteorological factor with the greatest effect on the probability of an accident occurring. Additionally, we concluded that Friday is the day of the week on which more accidents occur than on other days. These results are of importance to the decision makers responsible for planning the most effective allocation of resources for traffic surveillance. Keywords: risk prediction; road accidents; supervised classification; classical methods; deep neural networks 1. Introduction Road accidents cause multiple deaths each year and result in economic and physical damage to their victims; additionally, they incur the loss of public resources. Preventive action by the security forces has focused on what is known as Information-Guided Polic- ing [1]. Since accident-related data are stored in the National Guard database, it is possible to discover patterns correlated with the occurrence of accidents and to create knowledge that is useful in decision-making. Data-mining techniques have evolved significantly in recent decades and are being widely applied to several real-world problems. Current data-mining methods can be used on a database to rapidly extract knowledge that can help to guide policing methods and thus improve accident-prevention techniques and awareness campaigns produced by the security forces. This work aims to develop a tool to aid Information-Guided Policing in traffic manage- ment. Several data mining algorithms were applied to different types of datasets, including the National Guard database, which contains multiple accident reports. To complement the data provided by the National Guard, other publicly available databases were explored, such as meteorological data sources and the annual calendar. This work is one of the limited number of research projects carried out by Portuguese researchers using data from the Portuguese National Guard to analyze and predict road acci- dents. One of the objectives of this work is to provide statistical and predictive information on traffic accidents for the National Guard and other researchers. This investigation is original because, unlike other works that use categorical variables to identify the variables that most influence the severity of accidents, it sets out to predict the number of accidents likely to occur in a future time frame. One of the main objectives of this Informatics 2023, 10, 17. https://doi.org/10.3390/informatics10010017 https://www.mdpi.com/journal/informatics