KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining
Isaac Triguero
1
, Sergio Gonz ´ alez
2
, Jose M. Moyano
4
, Salvador Garc´ ıa
2
, Jes ´ us Alcal ´ a-Fdez
2
, Juli´ an
Luengo
2
, Alberto Fern´ andez
2
, Maria Jos´ e del Jes ´ us
5
, Luciano S ´ anchez
3
, Francisco Herrera
2
1
School of Computer Science
University of Nottingham, Jubilee Campus
Nottingham NG8 1BB, United Kingdom
E-mail: Isaac.Triguero@nottingham.ac.uk
2
Department of Computer Science and Artificial Intelligence
University of Granada, Granada, Spain, 18071
3
Department of Computer Science
University of Oviedo, Gij´ on, 33204, Spain
4
Department of Computer Science and Numerical Analysis
University of Cordoba, 14071 Cordoba, Spain
5
Department of Computer Science
University of Ja´ en, Ja´ en, Spain
Abstract
This paper introduces the 3
rd
major release of the KEEL Software. KEEL is an open source Java frame-
work (GPLv3 license) that provides a number of modules to perform a wide variety of data mining tasks.
It includes tools to perform data management, design of multiple kind of experiments, statistical analyses,
etc. This framework also contains KEEL-dataset, a data repository for multiple learning tasks featuring
data partitions and algorithms’ results over these problems. In this work, we describe the most recent
components added to KEEL 3.0, including new modules for semi-supervised learning, multi-instance
learning, imbalanced classification and subgroup discovery. In addition, a new interface in R has been
incorporated to execute algorithms included in KEEL. These new features greatly improve the versatility
of KEEL to deal with more modern data mining problems.
Keywords: Open Source, Java, Data Mining, Preprocessing, Evolutionary Algorithms.
1. Introduction
Data Mining (DM) techniques
25
are widely used in
a broad number of applications that go beyond the
computer science field
41
. In order to ease the ac-
cess to these models for people not directly related
to computer science, many commercial and non-
commercial software suites have been made avail-
able. The majority of the former are commercially
distributed (e.g. SPSS Clementine, Oracle Data
Mining or KnowledgeSTUDIO), but there is still a
good number of open source tools. Among the ex-
isting open source applications, Workflow-based en-
vironments allow us to visually chain a number of
DM methods together in a pipeline. The most used
DM apps of this kind are: Weka
18
, KNIME
1
and
KEEL
2
.
International Journal of Computational Intelligence Systems, Vol. 10 (2017) 1238–1249
___________________________________________________________________________________________________________
1238
Received 6 March 2017
Accepted 9 September 2017
Copyright © 2017, the Authors. Published by Atlantis Press.
This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).