Journal of Engineering Science and Technology
Vol. 14, No. 3 (2019) 1572 - 1586
© School of Engineering, Taylor’s University
1572
A COMPARISON OF MACHINE LEARNING TECHNIQUES FOR
ANDROID MALWARE DETECTION USING APACHE SPARK
LARAIB U. MEMON, NARMEEN Z. BAWANY*, JAWWAD A. SHAMSI
Systems Research Laboratory, National University of
Computer and Emerging Sciences, Karachi, Pakistan
*Corresponding Author: narmeen.bawany@nu.edu.pk
Abstract
Wide-scale popularity of Android devices has necessitated the need of having
effective means for detection of malicious applications. Machine learning
based classification of android applications require training and testing on a
large dataset. Motivated by these needs, we provide extensive evaluation of
Android applications for classification to either benign or malware
applications. Using a 17-node Apache Spark cluster, we utilized seven different
machine learning classifiers and applied them on the SherLock dataset - one of
the largest available dataset for malware detection of Android applications.
From the dataset of 83 attributes, we identified 29 suitable features of
applications which are related in identifying a malware. Our analysis revealed
that gradient boosted trees classification mechanism provides highest precision
and accuracy and lowest false positive rate in detection of malware
applications. We also applied our model to develop a real-time cloud based
malware detection system. This research is novel and beneficial in providing
extensive evaluation using large dataset.
Keywords: Android malware, Android security, Big data, Cloud, Malicious
android applications, Malware detection, Spark.