Malware Detection in Android Systems with
Traditional Machine Learning Models: A Survey
Esra Calik Bayazit
Fatih Sultan Mehmet Vakif University
Marmara University Institute of Science,
Computer Engineering Department
Istanbul/Turkey
ecalik@fsm.edu.tr
Ozgur Koray Sahingoz
Istanbul Kultur University
Computer Engineering Department
Istanbul/Turkey
o.sahingoz@iku.edu.tr
Buket Dogan
Marmara University
Department of Computer Engineering,
Faculty of Technology
Istanbul/Turkey
buketb@marmara.edu.tr
Abstract— Due to the increased number of mobile devices,
they are integrated in every dimension of our daily life. To
execute some sophisticated programs, a capable operating must
be set up on them. Undoubtedly, Android is the most popular
mobile operating system in the world. IT is extensively used both
in smartphones and tablets with an open source manner which is
distributed with Apache License. Therefore, many mobile
application developers focused on these devices and implement
their products. In recent years, the popularity of Android devices
makes it a desirable target for malicious attackers. Especially
sophisticated attackers focused on the implementation of
Android malware which can acquire and/or utilize some personal
and sensitive data without user consent. It is therefore essential
to devise effective techniques to analyze and detect these threats.
In this work, we aimed to analyze the algorithms which are used
in malware detection and making a comparative analysis of the
literature. With this study, it is intended to produce a
comprehensive survey resource for the researchers, which aim to
work on malware detection.
Keywords—Machine Learning, Android System, Malware
Detection, Survey
I. INTRODUCTION
With the spread of Internet usage every day, there is an
increasing number of mobile devices connected with others,
and the processes made on these devices are becoming varied
and being complex every day. This increased used converts all
our daily life operation to transfer from the real world to the
cyber world. Although this transformation makes our lives
easier, it also brings a significant disadvantage with it:
Security. In the cyberworld, the cyber criminals can easily hide
themselves with the use of anonymous structure of Internet.
Especially with the enormous use of mobile devices, not only
the local area networks but also the single end users are the
main target of these attackers. They tried to use deficiencies of
the networks or the weakness of the human user by using some
malicious websites or programs. According to the 2020 first
quarter “McAfee Labs Threats’’ report, %98 of the attackers is
targeting Android devices [1]. Fig. 1 shows the total
distribution of malware for the last 5 years.
Since Android is widespread and there are many threats for
helping and improving the security detection and classification
tools, efficient malware analysis and detection techniques are
needed. Malwares include worms, backdoors, viruses,
spywares, trojans, and other malicious programs that damage
the system. There are various malware attack techniques that
send private information and attack the system (specifically
Android platform) without the victim's knowledge. To detect
these attacks two main analysis are preferred: Static Analysis
and Dynamic Analysis.
Fig. 1 Total Distribution of Malware late 5 years [2]
In static analysis, detection system focusses on the
properties of the software both by investigating its execution
and also its source code. In this analysis the signature of the
attack, or the used permission of the program bytecode of it is
investigated. However, this type of detection mechanism has a
main deficiency of not catching the “zero day” attack which
has not been encountered before. To detect this type of
malwares, a dynamic analysis should be preferred. In dynamic
model generally the detection system tries to define the normal
behavior of the system by training with the previous normal
data transfer or permission request. Then it tries to catch the
abnormal behaviors. By identifying them suspicious requests.
In the later model, the training of the detection system is
very crucial, and in the literature machine learning algorithms
are generally preferred. Therefore, in this paper, we aimed to
write a survey paper to analysis the background of the topic
and also literature review of the android malware analysis in
the machine learning content by making a comparative
analysis. It is aimed to collect all related information about the
topic in a single survey paper, for enabling the new research to
reach all related topic in a single source.
The rest of the paper is organized as follows. In the next
section the background knowledge about the Malwares and
their similar topics are detailed. Then in section 3, malware
detection systems and their types are categorized. Section 4
focusses on the mostly preferred machine learning algorithms,
specifically in malware detection systems. A comparative
978-1-7281-9352-6/20/$31.00 ©2020 IEEE
Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on January 09,2021 at 14:43:35 UTC from IEEE Xplore. Restrictions apply.