Optimized peer to peer QSPR prediction of enthalpy of formation using outlier detection and subset selection B. Firdaus Begam 1 & J. Satheesh Kumar 1 & Gyoo-Soo Chae 2 Received: 18 May 2017 /Accepted: 6 April 2018 # Springer Science+Business Media, LLC, part of Springer Nature 2018 Abstract Quantitative Structure Property Relationship (QSPR) approach provides a model to understand the property or activity of a molecule by identifying the relationship with its chemical structure. Accurate identification of property of a molecule has higher influence of modern drug discovery system. Hence, development of an efficient method to identify the molecule property or activity is becoming mandatory component in drug design. Modern drug discovery system involves cluster networks since drug design process can be enhanced with the support of peer to peer networks. Few Molecule will have high dimensional structure and descriptor information where these information can be efficiently handled by cluster networks. This research work visualizes existing bench marking models based on Polynomial Regression (PR), Principal Component Regression (PCR) and Partial Least Square Regression (PLSR) with respect to fitted response and prediction. An optimized QSPR model (FDROL) with fuzzy minimum redundancy maximum relevance (FmRMR) data reduction (FDR) and outlier detection (OL) was proposed. The influences of topological descriptor to predict the physicochemical property of hydrocarbons have been determined. The dataset has been analyzed by proposed method using polynomial regression (PRFDROL), principal component regression (PCFDROL) and partial least square regression (PLFDROL) to predict the enthalpy of formation of hydrocarbons. The model was validated with high correlation coefficient (r, r 2 , adjr 2 , F) and lower standard error (se) which shows that the model has good predictive ability. The squared correlation coefficient (r 2 ) for preprocessed data using PR, PCR and PLSR were obtained as 1, 0.98392 and 0.9839 which were better predicted and fitted responses compare with existing methods. The optimized QSPR model with PR shows best fit to predict enthalpy of formation of hydrocarbons. Keywords Molecular descriptor . Maximum redundancy minimum relevance . Mahalanobis distance method . Polynomial regression . Principal component regression . Partial Least Square regression 1 Introduction Quantitative Structure Property Relationship/ Quantitative Structure Activity Relationship (QSAR/QSPR) were in- troduced by Hansch to predict or understand the relation- ship between structure of chemical compound to its bio- logical activity and physicochemical property [ 1, 2]. QSPR is a mathematical model which involves steps, such as collection of data, selection of appropriate data set needed for analysis and apply correlation and statistical techniques to build prediction model (refer Fig. 1 ). QSPR also represent physico-chemical properties or bio- logical activity which are correlated with structural char- acteristics of molecules (molecular descriptors) [2, 3]. This model have wider application areas related to drug discovery and development process like predicting physi- ochemical properties, therapeutic agents, drug resistance, This article is part of the Topical Collection: Special Issue on Convergence P2P Cloud Computing Guest Editor: Jung-Soo Han * J. Satheesh Kumar jsathee@rediffmail.com B. Firdaus Begam firdh_2002@yahoo.com Gyoo-Soo Chae gschae00@gmail.com 1 Department of Computer Applications, School of Computer Science and Engineering, Bharathiar University, Coimbatore, Tamil Nadu, India 2 Division of Information and Communication, Baekseok University, Cheonan, South Korea Peer-to-Peer Networking and Applications https://doi.org/10.1007/s12083-018-0650-4