Malware Detection in Android Systems with Traditional Machine Learning Models: A Survey Esra Calik Bayazit Fatih Sultan Mehmet Vakif University Marmara University Institute of Science, Computer Engineering Department Istanbul/Turkey ecalik@fsm.edu.tr Ozgur Koray Sahingoz Istanbul Kultur University Computer Engineering Department Istanbul/Turkey o.sahingoz@iku.edu.tr Buket Dogan Marmara University Department of Computer Engineering, Faculty of Technology Istanbul/Turkey buketb@marmara.edu.tr Abstract— Due to the increased number of mobile devices, they are integrated in every dimension of our daily life. To execute some sophisticated programs, a capable operating must be set up on them. Undoubtedly, Android is the most popular mobile operating system in the world. IT is extensively used both in smartphones and tablets with an open source manner which is distributed with Apache License. Therefore, many mobile application developers focused on these devices and implement their products. In recent years, the popularity of Android devices makes it a desirable target for malicious attackers. Especially sophisticated attackers focused on the implementation of Android malware which can acquire and/or utilize some personal and sensitive data without user consent. It is therefore essential to devise effective techniques to analyze and detect these threats. In this work, we aimed to analyze the algorithms which are used in malware detection and making a comparative analysis of the literature. With this study, it is intended to produce a comprehensive survey resource for the researchers, which aim to work on malware detection. Keywords—Machine Learning, Android System, Malware Detection, Survey I. INTRODUCTION With the spread of Internet usage every day, there is an increasing number of mobile devices connected with others, and the processes made on these devices are becoming varied and being complex every day. This increased used converts all our daily life operation to transfer from the real world to the cyber world. Although this transformation makes our lives easier, it also brings a significant disadvantage with it: Security. In the cyberworld, the cyber criminals can easily hide themselves with the use of anonymous structure of Internet. Especially with the enormous use of mobile devices, not only the local area networks but also the single end users are the main target of these attackers. They tried to use deficiencies of the networks or the weakness of the human user by using some malicious websites or programs. According to the 2020 first quarter “McAfee Labs Threats’’ report, %98 of the attackers is targeting Android devices [1]. Fig. 1 shows the total distribution of malware for the last 5 years. Since Android is widespread and there are many threats for helping and improving the security detection and classification tools, efficient malware analysis and detection techniques are needed. Malwares include worms, backdoors, viruses, spywares, trojans, and other malicious programs that damage the system. There are various malware attack techniques that send private information and attack the system (specifically Android platform) without the victim's knowledge. To detect these attacks two main analysis are preferred: Static Analysis and Dynamic Analysis. Fig. 1 Total Distribution of Malware late 5 years [2] In static analysis, detection system focusses on the properties of the software both by investigating its execution and also its source code. In this analysis the signature of the attack, or the used permission of the program bytecode of it is investigated. However, this type of detection mechanism has a main deficiency of not catching the “zero day” attack which has not been encountered before. To detect this type of malwares, a dynamic analysis should be preferred. In dynamic model generally the detection system tries to define the normal behavior of the system by training with the previous normal data transfer or permission request. Then it tries to catch the abnormal behaviors. By identifying them suspicious requests. In the later model, the training of the detection system is very crucial, and in the literature machine learning algorithms are generally preferred. Therefore, in this paper, we aimed to write a survey paper to analysis the background of the topic and also literature review of the android malware analysis in the machine learning content by making a comparative analysis. It is aimed to collect all related information about the topic in a single survey paper, for enabling the new research to reach all related topic in a single source. The rest of the paper is organized as follows. In the next section the background knowledge about the Malwares and their similar topics are detailed. Then in section 3, malware detection systems and their types are categorized. Section 4 focusses on the mostly preferred machine learning algorithms, specifically in malware detection systems. A comparative 978-1-7281-9352-6/20/$31.00 ©2020 IEEE Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on January 09,2021 at 14:43:35 UTC from IEEE Xplore. Restrictions apply.