Journal of Engineering Science and Technology Vol. 14, No. 3 (2019) 1572 - 1586 © School of Engineering, Taylor’s University 1572 A COMPARISON OF MACHINE LEARNING TECHNIQUES FOR ANDROID MALWARE DETECTION USING APACHE SPARK LARAIB U. MEMON, NARMEEN Z. BAWANY*, JAWWAD A. SHAMSI Systems Research Laboratory, National University of Computer and Emerging Sciences, Karachi, Pakistan *Corresponding Author: narmeen.bawany@nu.edu.pk Abstract Wide-scale popularity of Android devices has necessitated the need of having effective means for detection of malicious applications. Machine learning based classification of android applications require training and testing on a large dataset. Motivated by these needs, we provide extensive evaluation of Android applications for classification to either benign or malware applications. Using a 17-node Apache Spark cluster, we utilized seven different machine learning classifiers and applied them on the SherLock dataset - one of the largest available dataset for malware detection of Android applications. From the dataset of 83 attributes, we identified 29 suitable features of applications which are related in identifying a malware. Our analysis revealed that gradient boosted trees classification mechanism provides highest precision and accuracy and lowest false positive rate in detection of malware applications. We also applied our model to develop a real-time cloud based malware detection system. This research is novel and beneficial in providing extensive evaluation using large dataset. Keywords: Android malware, Android security, Big data, Cloud, Malicious android applications, Malware detection, Spark.