Medical & Biological Engineering & Computing https://doi.org/10.1007/s11517-018-1841-0 ORIGINAL ARTICLE Classification of malignant and benign lung nodules using taxonomic diversity index and phylogenetic distance Robherson Wector de Sousa Costa 1 · Giovanni Lucca Franc ¸a da Silva 1 · Antonio Oseas de Carvalho Filho 2 · Arist ´ ofanes Corr ˆ ea Silva 1 · Anselmo Cardoso de Paiva 1 · Marcelo Gattass 3 Received: 10 July 2017 / Accepted: 23 April 2018 © International Federation for Medical and Biological Engineering 2018 Abstract Lung cancer presents the highest cause of death among patients around the world, in addition of being one of the smallest survival rates after diagnosis. Therefore, this study proposes a methodology for diagnosis of lung nodules in benign and malignant tumors based on image processing and pattern recognition techniques. Mean phylogenetic distance (MPD) and taxonomic diversity index () were used as texture descriptors. Finally, the genetic algorithm in conjunction with the support vector machine were applied to select the best training model. The proposed methodology was tested on computed tomography (CT) images from the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI), with the best sensitivity of 93.42%, specificity of 91.21%, accuracy of 91.81%, and area under the ROC curve of 0.94. The results demonstrate the promising performance of texture extraction techniques using mean phylogenetic distance and taxonomic diversity index combined with phylogenetic trees. Keywords Medical image · Lung nodules diagnosis · Phylogenetic tree · Mean phylogenetic distance · Taxonomic diversity index Robherson Wector de Sousa Costa robhersonwector@gmail.com Giovanni Lucca Franc ¸a da Silva gioh.lucca@gmail.com Antonio Oseas de Carvalho Filho antoniooseas@gmail.com Arist´ ofanes Corrˆ ea Silva aricsilva@gmail.com Anselmo Cardoso de Paiva anselmo.c.paiva@gmail.com Marcelo Gattass mgattass@tecgraf.puc-rio.br 1 Federal University of Maranh˜ ao - UFMA, Applied Computing Group - NCA, Av. dos Portugueses, SN, Campus do Bacanga, Bacanga, S˜ ao Lu´ ıs, MA, 65085-580, Brazil 2 Federal University of Piau´ ı - UFPI, Rua C´ ıcero Duarte, SN, Campus de Picos, Junco, Picos, PI, 64600-000, Brazil 3 Pontifical Catholic University of Rio de Janeiro - PUC-Rio, Rua S˜ ao Vicente, 225, G´ avea, Rio de Janeiro, RJ, 22453-900, Brazil 1 Introduction Lung cancer is the most commonly occurring malignant tumor and is characterized by an annual incidence increase of 2%. It is strongly associated with tobacco use. Annually, the number of deaths from lung cancer exceeds the total number of deaths from colorectal, breast, and prostate cancers [1]. A lung nodule is characterized as a rounded opacity in the lung with a diameter less than 3 cm, surrounded by lung parenchyma [2]. Lung lesions with diameters exceeding 3 cm are considered to be malignant masses [3]. Early diagnosis and treatment of lung cancer increases the patient probability of survival by 90% [4]. Thus, medical images comprising mainly of computed tomography (CT) present important tools for precocious diagnosis [5]. However, the detection of nodules on the areas of CT images is not an easy task since the densities of the nodules may be similar to that of the other lung structures. Additionally, the nodules may be characterized by low contrast and small sizes in complex anatomic regions, and they could be close or joined to blood vessels or the lung border [6].