On the Feasibility of Online Malware Detection with Performance Counters John Demme Matthew Maycock Jared Schmitz Adrian Tang Adam Waksman Simha Sethumadhavan Salvatore Stolfo Department of Computer Science, Columbia University, NY, NY 10027 jdd@cs.columbia.edu, mhm2159@columbia.edu, {jared,atang,waksman,simha,sal}@cs.columbia.edu ABSTRACT The proliferation of computers in any domain is followed by the proliferation of malware in that domain. Systems, in- cluding the latest mobile platforms, are laden with viruses, rootkits, spyware, adware and other classes of malware. De- spite the existence of anti-virus software, malware threats persist and are growing as there exist a myriad of ways to subvert anti-virus (AV) software. In fact, attackers today exploit bugs in the AV software to break into systems. In this paper, we examine the feasibility of building a mal- ware detector in hardware using existing performance coun- ters. We find that data from performance counters can be used to identify malware and that our detection techniques are robust to minor variations in malware programs. As a result, after examining a small set of variations within a fam- ily of malware on Android ARM and Intel Linux platforms, we can detect many variations within that family. Further, our proposed hardware modifications allow the malware de- tector to run securely beneath the system software, thus setting the stage for AV implementations that are simpler and less buggy than software AV. Combined, the robustness and security of hardware AV techniques have the potential to advance state-of-the-art online malware detection. Categories and Subject Descriptors C.0 [Computer Systems Organization]: General—Hard- ware/software interfaces ; K.6.5 [Management of Com- puting and Information Systems]: Security and Pro- tection—Invasive software General Terms Security in Hardware, Malware and its Mitigation Keywords Malware detection, machine learning, performance counters 1 This work was supported by grants FA 99500910389 (AFOSR), FA 865011C7190 (DARPA), FA 87501020253 (DARPA), CCF/TC 1054844 (NSF), Alfred P. Sloan fellowship, and gifts from Microsoft Research, WindRiver Corp, Xilinx and Synopsys Inc. Any opinions, findings, conclusions and recommendations do not reflect the views of the US Government or commercial entities. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISCA ’13 Tel-Aviv, Israel Copyright 2013 ACM 978-1-4503-2079-5/13/06 ...$15.00. 1. INTRODUCTION Malware – short for malicious software – is everywhere. In various forms for a variety of incentives, malware exists on desktop PCs, server systems and even mobile devices like smart phones and tablets. Some malware litter devices with unwanted advertisements, creating ad revenue for the mal- ware creator. Others can dial and text so-called “premium” services resulting in extra phone bill charges. Some other malware is even more insidious, hiding itself (via rootkits or background processes) and collecting private data like GPS location or confidential documents. This scourge of malware persists despite the existence of many forms of protection software, antivirus (AV) software being the best example. Although AV software decreases the threat of malware, it has some failings. First, because the AV system is itself software, it is vulnerable to attack. Bugs or oversights in the AV software or underlying system software (e.g., the operating system or hypervisor) can be exploited to disable AV protection. Second, production AV software typically use static characteristics of malware such as suspicious strings of instructions in the binary to detect threats. Unfortunately, it is quite easy for malware writers to produce many different code variants that are functionally equivalent, both manually and automatically, thus defeating static analysis easily. For instance, one malware family in our data set, AnserverBot, had 187 code variations. Alterna- tives to static AV scanning require extremely sophisticated dynamic analysis, often at the cost of significant overhead. Given the shortcomings of static analysis via software im- plementations, we propose hardware modifications to sup- port secure efficient dynamic analysis of programs to detect malware. This approach potentially solves both problems. First, by executing AV protection in secure hardware (with minimum reliance on system software), we significantly re- duce the possibility of malware subverting the protection mechanisms. Second, we posit that dynamic analysis makes detection of new, undiscovered malware variants easier. The intuition is as follows: we assume that all malware within a certain family of malware, regardless of the code variant, at- tempts to do similar things. For instance, they may all pop up ads, or they may all take GPS readings. As a result, we would expect them to work through a similar set of program phases, which tend to exhibit similar detectable properties in the form of performance data (e.g., IPC, cache behavior). In this paper, we pose and answer the following central feasibility question: Can dynamic performance data be used to characterize and detect malware? We collect longitudi- nal, fine-grained microarchitectural traces of recent mobile 559