Application of Data Mining Algorithms to Classify Biological Data: The Coffea canephora Genome Case Jeferson Arango-López 1,2,6(&) , Simon Orozco-Arias 4 , Johnny A. Salazar 3 , and Romain Guyot 5 1 FIET, Universidad del Cauca, Calle 5 Nº 4-70, Popayán, Colombia jal@unicauca.edu.co 2 Facultad de Ingeniería, Universidad de Caldas, Calle 65 Nº 26-10, Manizales, Colombia 3 Escuela de Administración y Mercadotecnia del Quindío (EAM), Av. Bolívar # 3-11, Armenia, Colombia 4 Centro de Bioinformática y Biología Computacional (BIOS), Ecoparque los Yarumos, Manizales, Colombia 5 IRD, CIRAD, Univ. Montpellier, IPME, BP 64501, 34394 Montpellier Cedex 5, France 6 Departamento de lenguajes y sistemas informaticos, Universidad de Granada, Calle Periodista Daniel Saucedo Aranda, s/n, 18071 Granada, Spain Abstract. Bioinformatics is now one of the most important elds of modern sciences grouping different elds of research such as Biology, Genomics, Genetics and Molecular evolution. These elds generate a large amount of information via the utilization of the new generations of sequencing techniques (NGS). This amount of data requires the development of a new generation of tools able to store and analyze ef ciently and rapidly the information. Coffea canephora also called the Robusta coffee is one of the most important tree for tropical countries. This genome has been recently sequenced. One of the characteristics of this genome is the presence of numerous repeated elements, representing more than 50% of the genome sequence. The analysis and classi- cation of such amount of repeated sequences require innovative approaches. Here, we present how data mining and machine learning can contribute to process sequencing data for the fast classication of a class of repeated sequences, called transposable elements. Keywords: Data mining Á Bioinformatics Á Transposable elements Á Coffea canephora 1 Introduction Bioinformatics is a recent and now a major research eld. Several authors have attempted to give an unambiguous denition for bioinformatics. From a biology per- spective, López-Gartner and coworkers dene the bioinformatics as a new discipline helping with the discovery of biological information through the implementation of © Springer International Publishing AG 2017 A. Solano and H. Ordoñez (Eds.): CCC 2017, CCIS 735, pp. 156170, 2017. DOI: 10.1007/978-3-319-66562-7_12