1545-5963 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCBB.2018.2848653, IEEE/ACM Transactions on Computational Biology and Bioinformatics Abstract—Automated plant species identification system could help botanists and layman in identifying plant species rapidly. Deep learning is robust for feature extraction as it is superior in providing deeper information of images. In this research, a new CNN-based method named D-Leaf was proposed. The leaf images were pre-processed and the features were extracted by using three different Convolutional Neural Network (CNN) models namely pre-trained AlexNet, fine-tuned AlexNet and D-Leaf. These features were then classified by using five machine learning techniques, namely, Support Vector Machine (SVM), Artificial Neural Network (ANN), k-Nearest-Neighbour (k-NN), Naïve- Bayes (NB) and CNN. A conventional morphometric method computed the morphological measurements based on the Sobel segmented veins was employed for benchmarking purposes. The D-Leaf model achieved a comparable testing accuracy of 94.88% as compared to AlexNet (93.26%) and fine-tuned AlexNet (95.54%) models. In addition, CNN models performed better than the traditional morphometric measurements (66.55%). The features extracted from the CNN are found to be fitted well with the ANN classifier. D-Leaf can be an effective automated system for plant species identification as shown by the experimental results. Index Terms—tropical tree, deep learning, Convolutional Network, leaf vein morphometric, feature extraction, classification, Artificial Neural Network. I. INTRODUCTION The number of plant species are extremely huge, with about 391,000 vascular plant species all over the world [1]. Hence, it is impossible and not practical for a botanist or an expert, to be able to identify and classify all the species. In addition, some plant species may have high similarity between each other, taking a long time to differentiate them. In addition, many plants face the problem of extinction. Endangered and non- endangered plant species need to be preserved and conserved in a proper way to reduce the risks of extinction. Hence, there is a need to develop an automated or computerized system to identify and classify plants. Leaf shape is the most commonly used feature used to develop such automated plant classification systems. Other than shape, the leaf can provide additional information such as textures, veins, and colours. This research was supported by the Universiti Malaya Research Grant (UMRG) with the project number of RP038C-15AET. Jing Wei Tan and Siow-Wee Chang are with the Bioinformatics Programme, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia. E-mail: jingwei_92@siswa.um.edu.my, siowwee@um.edu.my. Sameem Abdul-Kareem is with the Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia. Email: sameem@um.edu.my. With the advancement of science and technology, machine learning has been widely employed for classification and recognition tasks in many domains especially in the biological fields. Machine learning techniques, such as, the Artificial Neural Network, Support Vector Machines, k-Nearest Neighbour, and others are artificial intelligent techniques mainly employed to perform pattern recognition. Currently, deep learning, a subfield of artificial intelligence (AI), is a popular and widely used technique, that has been applied in various domains including biology, medical, computer vision, speech recognition and others [2-5]. Deep learning is a modern AI approach, which contributes a robust framework towards supervised learning [6]. It is able to map an input vector rapidly and efficiently to an output vector even in a large dataset [6]. Deep learning architecture can be further divided into Convolutional Neural Network (CNN), Deep Belief Network (DBN) and so on. Deep learning is able to extract more detailed information as compared to the conventional machine learning techniques. In this research, CNN is applied to extract the features from leaf images of selected tree species. Three different CNN models were used, namely, the pre-trained AlexNet CNN model, fine-tuned pre-trained AlexNet CNN model and the proposed D-Leaf CNN model. The extracted features were then fed into a few classification approaches for learning and training purposes. Five classifiers were employed in this research which are CNN, Support Vector Machine (SVM), Artificial Neural Network (ANN), k-Nearest Neighbour (k-NN) and Naïve Bayes (NB). A conventional method, which segmented the leaf veins by using Sobel edge detection technique and performed vein morphological measurements, was used for benchmarking. Based on the literature review, this is one of the first few studies, which have applied CNN in tropical tree species classification, by using both leaf morphometric and venation pattern approaches. II. BACKGROUND STUDY Fig. 1 shows the fundamental steps of an automated plant classification system. Initially, the leaf images would be acquired using digital camera, scanner or some other equipments. The images were then pre-processed to remove Hwa Jen Yap is with the Department of Mechanical Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur, Malaysia. Email: hjyap737@um.edu.my. Kien-Thai Yong is with the Ecology and Biodiversity Programme, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia. Email: yongkt@um.edu.my. Deep Learning for Plant Species Classification using Leaf Vein Morphometric Jing Wei Tan, Siow-Wee Chang, Sameem Abdul-Kareem, Hwa Jen Yap, Kien-Thai Yong