Business Process Mining by Means of Statistical Languages Model
Dafne Rosso Pelayo and Raúl A. Trejo Ramírez
Instituto Tecnológico de Estudios Superiores de Monterrey
drosso@pemex.gob.mx, raul.trejo@itesm.mx
Abstract
The goal of this research is to provide an
alternative for business processes evaluation and
tracking, based on the analysis of non-structured
information generated by such processes within the
organization areas. In this article we introduce a
method to determine the occurrence probability of a
business process within the enterprise’s text
documents. The proposed method introduces the use of
Statistical language model (SLM) [1], as a new
technique in business processes mining area [2]. In
order to obtain this objective the following is
considered: the probability that a sub process or a
process part is in the text paragraph; the probability
that this text belongs to a business process; the
language model of the processes set; and the set of
realized activities which is reconstructed according to
the processes that gave origin to the analyzed
documents.
1. Introduction
Business processes mining is a technique that uses
workflows registered within enterprise applications
logs [3, 13, 15] to reconstruct business processes. From
the earlier works in business processes mining [4] to
present day, there has been a development of new
heuristic techniques based on intelligent computation
that involve genetic algorithms, data mining
algorithms, and neuronal networks, in addition to
traditional statistical techniques. In [2] there is a
summary of these developed techniques. For example,
[3], [10], [11], show the reconstruction of a business
processes model by making a job workflow modeling,
based on the analysis and events log in the period of
time in which these happen, nevertheless, these
analyses come mostly from the (structured) logs of
enterprise systems like SAP, PeopleSoft, or CRM
systems.
The business processes of our interest follow the
process classification framework (PCF), this
framework is a high level neutral enterprise model, that
reflect the activities in which the enterprise incurs to
satisfy its business and organizational objectives [14].
The alternative we propose is a novel technique to
perform business processes mining. One of the main
motivations for this research is that, due the nature of
the processes, the non-structured information they
generate is typically very generic, vague and complex
in structure. Besides, business processes in many cases
are not completely automated, because there are
activities, analysis of experts and decision making
which are not feasible to structure. As [2] indicates,
more formal research is required for business
processes, being important to look for solutions that
allow analysis and obtain knowledge of this type of
information.
In this research we focus in text documents
generated as a result of business processes execution,
instead of starting from the analysis of workflow logs.
The documents analyzed, belong to a dominion of
widely dispersed texts, i.e., texts belong to different
areas and contain highly dispersed information of the
items within the document. To reconstruct the original
processes, a statistical language model will be used to
classify documents, SLM has been applied for
information retrieval in heuristic techniques for
document classification [1, 9], in this work, we use
SLM for document classification according to process
events or activities [15]. This allows for establishing a
method of text clustering for the document, which
operates by means of a probability precision.
2008 Seventh Mexican International Conference on Artificial Intelligence
978-0-7695-3441-1/08 $25.00 © 2008 IEEE
DOI 10.1109/MICAI.2008.49
404
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 12, 2009 at 10:44 from IEEE Xplore. Restrictions apply.