Application of Data Mining Algorithms
to Classify Biological Data: The Coffea
canephora Genome Case
Jeferson Arango-López
1,2,6(&)
, Simon Orozco-Arias
4
,
Johnny A. Salazar
3
, and Romain Guyot
5
1
FIET, Universidad del Cauca, Calle 5 Nº 4-70, Popayán, Colombia
jal@unicauca.edu.co
2
Facultad de Ingeniería, Universidad de Caldas,
Calle 65 Nº 26-10, Manizales, Colombia
3
Escuela de Administración y Mercadotecnia del Quindío (EAM),
Av. Bolívar # 3-11, Armenia, Colombia
4
Centro de Bioinformática y Biología Computacional (BIOS),
Ecoparque los Yarumos, Manizales, Colombia
5
IRD, CIRAD, Univ. Montpellier, IPME,
BP 64501, 34394 Montpellier Cedex 5, France
6
Departamento de lenguajes y sistemas informaticos, Universidad de Granada,
Calle Periodista Daniel Saucedo Aranda, s/n, 18071 Granada, Spain
Abstract. Bioinformatics is now one of the most important fields of modern
sciences grouping different fields of research such as Biology, Genomics,
Genetics and Molecular evolution. These fields generate a large amount of
information via the utilization of the new generations of sequencing techniques
(NGS). This amount of data requires the development of a new generation of
tools able to store and analyze ef ficiently and rapidly the information. Coffea
canephora also called the Robusta coffee is one of the most important tree for
tropical countries. This genome has been recently sequenced. One of the
characteristics of this genome is the presence of numerous repeated elements,
representing more than 50% of the genome sequence. The analysis and classi-
fication of such amount of repeated sequences require innovative approaches.
Here, we present how data mining and machine learning can contribute to
process sequencing data for the fast classification of a class of repeated
sequences, called transposable elements.
Keywords: Data mining Á Bioinformatics Á Transposable elements Á Coffea
canephora
1 Introduction
Bioinformatics is a recent and now a major research field. Several authors have
attempted to give an unambiguous definition for bioinformatics. From a biology per-
spective, López-Gartner and coworkers define the bioinformatics as a new discipline
helping with the discovery of biological information through the implementation of
© Springer International Publishing AG 2017
A. Solano and H. Ordoñez (Eds.): CCC 2017, CCIS 735, pp. 156–170, 2017.
DOI: 10.1007/978-3-319-66562-7_12