Putative Drug and Vaccine Target Identification in Leishmania donovani Membrane Proteins Using Na ıve Bayes Probabilistic Classifier Arvind Kumar Sinha, Pradeep Singh, Anand Prakash, Dharm Pal, Anuradha Dube, and Awanish Kumar Abstract—Predicting the role of protein is one of the most challenging problems. There are few approaches available for the prediction of role of unknown protein in terms of drug target or vaccine candidate. We propose here Na ıve Bayes probabilistic classifier, a promising method for reliable predictions. This method is tested on the proteins identified in our mass spectrometry based membrane protemics study of Leishmania donovani parasite that causes a fatal disease (Visceral Leishmaniasis) in humans all around the world. Most of the vaccine/drug targets belonging to membrane proteins are represented as key players in the pathogenesis of Leishmania infection. Analyses of our previous results, using Na ıve Bayes probabilistic classifier, indicate that this method predicts the role of unknown/ hypothetical protein (as drug target/vaccine candidate) significantly with higher precision. We have employed this method in order to provide probabilistic predictions of unknown/hypothetical proteins as targets. This study reports the unknown/hypothetical proteins of Leishmania membrane fraction as a potential drug targets and vaccine candidate which is vital information for this parasite. Future molecular studies and characterization of these potent targets may produce a recombinant therapeutic/prophylactic tool against Visceral Leishmaniasis. These unknown/hypothetical proteins may open a vast research field to be exploited for novel treatment strategies. Index Terms—Leishmania donovani, membrane proteomics, Na ıve Bayes method, unknown/hypothetical proteins, role prediction Ç 1 INTRODUCTION V ISCERAL leishmaniasis (VL) or Kala-azar is a serious health threat and fatal disease for human which is caused by a protozoan parasite Leishmania donovani. It is prevalent in different part of world and endemic in Bihar, Assam, West Bengal and Eastern Uttar Pradesh province of India [1]. The digenetic life cycle of Leishmania exists in vertebrates as defini- tive host and Phlebotamine sand fly (vector) as an intermediate host. Due to difficulties in vector control and the lack of an effective vaccine, the control of Leishmaniasis relies on che- motherapy mostly [2]. Unfortunately the prevalence of para- sites is becoming resistant to the first line drug pentavalent antimony ðSb V Þ. Drug resistance is increasing in several parts of the world, including South America Europe the Middle East and most notably in India. However, primary failure (>65% unresponsiveness) to SAG has been reported from North Bihar province of India [3]. With the alarming rise of drug resistance, there is an urgent need to identify the targets in the parasite and proteomics is a well established approach for the development of novel targets and therapy [4]. Using proteomics approach, preliminary efforts in this direction comprise the generation of a partial 2-D gel map of L. donovani membrane proteins through MALDI-TOF been done to iden- tify Leishmania proteins [5]. In this published work, some pro- teins were identified as unknown and some proteins were hypothetical. Here we have used Na ıve Bayes (NB) probabi- listic classifier to know the role of unknown/hypothetical protein in terms of drug target (DT) and vaccine candidate (VC) from our previous reported membrane proteome of L. donovani [5]. Membrane proteins are responsible for many vital functions in the parasite and they might responsible for disease progression, therefore they become the prime targets. In this paper, Na ıve Bayes probabilistic classifier has been used for the predicting unknown/hypothetical protein as DT and VC. It is a simple probabilistic classifiers which applies the Baye’s theorem with strong independe- nce assumptions between the features [6]. Na ıve Bayes are highly scalable classifier that uses a number of biological parameters to envisage the unidentified instances. In our previous studies, numbers of proteins were reported as unknown/hypothetical in membrane protein fraction of L. donovani [5]. Na ıve Bayes probabilistic classifier was used on this experimental data set to predict the unknown/hypo- thetical proteins as DT/VC. Comparative experimental study in 10 fold cross validation was performed by Na ıve Bayes, Support vector Machine (SVM), Random Forest (RF), C4.5 Decision Tree methods. Na ıve Bayes was able to pre- dict the decisions efficiently and accurately among them. Therefore, Na ıve Bayes was selected as model in order to enable a computationally efficient approach for prediction of unknown protein. Unknown and hypothetical proteins of L. donovani membrane proteins were identified as putative A.K. Sinha and A. Prakash are with the Department of Mathematics, National Institute of Technology, Raipur 492010, India. E-mail: aksinha.maths@nitrr.ac.in, anand.p.pal@gmail.com. P. Singh is with the Department of Computer Science & Engineering, National Institute of Technology, Raipur 492010, India. E-mail: psingh.cs@nitrr.ac.in. D. Pal is with the Department of Chemical Engineering, National Institute of Technology, Raipur 492010, India. E-mail: dpsingh.che@nitrr.ac.in. A. Dube is with the Division of Parasitology, Central Drug Research Insti- tute, Lucknow 226031, India. E-mail: anuradhadube@gmail.com. A. Kumar is with the Department of Biotechnology, National Institute of Technology, Raipur 492010, India. E-mail: drawanishkr@gmail.com. Manuscript received 22 Sept. 2015; revised 29 Apr. 2016; accepted 12 May 2016; date of current version 2 Feb. 2017. For information on obtaining reprints of this article, please send e-mail to: reprints@ieee.org, and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TCBB.2016.2570217 204 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 14, NO. 1, JANUARY/FEBRUARY 2017 1545-5963 ß 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.