Citation: Dias, D.; Silva, J.S.;
Bernardino, A. The Prediction of
Road-Accident Risk through Data
Mining: A Case Study from Setubal,
Portugal. Informatics 2023, 10, 17.
https://doi.org/10.3390/
informatics10010017
Academic Editor: Olga Kurasova
Received: 31 December 2022
Revised: 22 January 2023
Accepted: 25 January 2023
Published: 30 January 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
informatics
Article
The Prediction of Road-Accident Risk through Data Mining:
A Case Study from Setubal, Portugal
David Dias
1,2
, José Silvestre Silva
1,3,4,
* and Alexandre Bernardino
2,5
1
Portuguese Military Academy, Rua Gomes Freire, 1169-203 Lisbon, Portugal
2
Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal
3
Military Academy Research Center (CINAMIL), Rua Gomes Freire, 1169-203 Lisbon, Portugal
4
Laboratory for Instrumentation, Biomedical Engineering and Radiation Physics (LIBPhys-UC),
3000-370 Coimbra, Portugal
5
Institute for Systems and Robotics (ISR/IST), 1049-001 Lisbon, Portugal
* Correspondence: jose.silva@academiamilitar.pt
Abstract: This work proposes a tool to predict the risk of road accidents. The developed system
consists of three steps: data selection and collection, preprocessing, and the use of mining algorithms.
The data were imported from the Portuguese National Guard database, and they related to accidents
that occurred from 2019 to 2021. The results allowed us to conclude that the highest concentration
of accidents occurs during the time interval from 17:00 to 20:00, and that rain is the meteorological
factor with the greatest effect on the probability of an accident occurring. Additionally, we concluded
that Friday is the day of the week on which more accidents occur than on other days. These results
are of importance to the decision makers responsible for planning the most effective allocation of
resources for traffic surveillance.
Keywords: risk prediction; road accidents; supervised classification; classical methods; deep
neural networks
1. Introduction
Road accidents cause multiple deaths each year and result in economic and physical
damage to their victims; additionally, they incur the loss of public resources. Preventive
action by the security forces has focused on what is known as Information-Guided Polic-
ing [1]. Since accident-related data are stored in the National Guard database, it is possible
to discover patterns correlated with the occurrence of accidents and to create knowledge
that is useful in decision-making. Data-mining techniques have evolved significantly in
recent decades and are being widely applied to several real-world problems. Current
data-mining methods can be used on a database to rapidly extract knowledge that can
help to guide policing methods and thus improve accident-prevention techniques and
awareness campaigns produced by the security forces.
This work aims to develop a tool to aid Information-Guided Policing in traffic manage-
ment. Several data mining algorithms were applied to different types of datasets, including
the National Guard database, which contains multiple accident reports. To complement the
data provided by the National Guard, other publicly available databases were explored,
such as meteorological data sources and the annual calendar.
This work is one of the limited number of research projects carried out by Portuguese
researchers using data from the Portuguese National Guard to analyze and predict road acci-
dents. One of the objectives of this work is to provide statistical and predictive information
on traffic accidents for the National Guard and other researchers.
This investigation is original because, unlike other works that use categorical variables
to identify the variables that most influence the severity of accidents, it sets out to predict the
number of accidents likely to occur in a future time frame. One of the main objectives of this
Informatics 2023, 10, 17. https://doi.org/10.3390/informatics10010017 https://www.mdpi.com/journal/informatics