JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 A Longitudinal Study of Application Structure and Behaviors in Android Haipeng Cai and Barbara Ryder Abstract—With the rise of the mobile computing market, Android has received tremendous attention from both academia and industry. Application programming in Android is known to have unique characteristics, and Android apps be particularly vulnerable to various security attacks. In response, numerous solutions for particular security issues have been proposed. However, there is little broad understanding about Android app code structure and behaviors along with their implications for app analysis and security defense, especially in an evolutionary perspective. To mitigate this gap, we present a longitudinal characterization study of Android apps to systematically investigate how they are built and execute over time. Through lightweight static analysis and method-level tracing, we examined the code and execution of 17,664 apps sampled from the apps developed in each of eight past years, with respect to metrics in three complementary dimensions. Our study revealed that (1) apps functionalities heavily rely on the Android framework/SDK, and the reliance continues to grow, (2) Activity components constantly dominated over other types of components and were responsible for the invocation of most lifecycle callbacks, (3) event-handling callbacks consistently focused more on user-interface events than system events, (4) the overall use of callbacks has been slowly diminishing over time, (5) the majority of exercised inter-component communications (ICCs) did not carry any data payloads, and (6) sensitive data sources and sinks targeted only one/two dominant categories of information or operations, and the ranking of source/sink categories remained quite stable throughout the eight years. We discuss the implications of our empirical findings for cost-effective app analysis and security defense for Android, and make cost-effectiveness improvement recommendations accordingly. Index Terms—Android, code structure, app behavior, longitudinal study, evolution, app analysis, security, ICC. ✦ 1 I NTRODUCTION T HE Android platform and its user applications (referred to as apps) have been dominating various mobile computing platforms, including smartphones, tablets, and other consumer electronics [1], [2]. Android developers are increasingly creating apps that cover a growing range of application domains. Meanwhile, accompanying the rapid growth of Android apps is a surge of security threats and attacks of various forms [1], [2]. In this context, it becomes crucial for both researchers and tool developers to understand the particular software ecosystem of Android for developing cost-effective solutions to assuring the quality of Android apps. Android apps have been primarily developed in two Java-based (JVM) languages, the canonical Java and Kotlin [3] (A Java-like language)—both are the current official development languages in Android. Yet these apps are different from traditional Java programs in how they are coded and executed. Android apps are supposed to rely on the Android SDK and various third-party libraries to realize their functionalities, according to existing (static) characterizations [4], [5]. In fact, many of the distinct characteristics of Android apps have led to unique challenges in developing sound and effective code-based app analyses [4], [6]. These challenges have resulted in specialization and customization, for the sake of Android apps, of analysis algorithms that were originally devised for traditional object-oriented programs. • Haipeng Cai is with the School of Electrical Engineering and Com- puter Science, Washington State University, Pullman, WA. E-mail: haipeng.cai@wsu.edu • Barbara Ryder is with the Department of Computer Science, Virginia Tech, Blacksburg, VA. E-mail: ryder@cs.vt.edu Manuscript received April 1, 2018; revised August 26, 2015. Specifically, the framework-based nature of Android apps requires substantial modeling of the platform and runtime for static analyses [4], [5], [7] to achieve reasonable accuracy. Implicit invocation between app components via a mechanism called inter-component communication (ICC) requires special treatments (e.g., ICC resolution [8], [9]) for a soundy [10] whole-program analysis. In addition, the event-driven paradigm in Android programming accounts for many challenges in app security analyses, such as determining component lifecycles [4], [6], [7] and computing callback control flows [6], [11]. Existing research on Android apps has been mainly aimed at security [12]. Further, most existing solutions targeted specific security issues, with merely a few offering a broader view of application security related characteristics in general [13], [14]. Intuitively, it is important to dissect app behaviors that commonly underlie varied security issues, so as to develop more capable and fundamental defense solutions that work across different kinds of those issues. Moreover, while a critical quality factor, security is not the only aspect of the holistic quality profile of apps. Knowledge about the underlying app behaviors is also essential for developing cost-effective app quality assurance solutions with respect to quality aspects other than security. Studies do exist which aim to characterize Android apps beyond the security aspect, yet current studies are not sufficient in multiple ways. First, most of the prior work in this area (e.g., [15], [16], [17], [18], [19], [20], [21]) exclusively targeted static characterizations by examining the source code rather than the run-time behaviors of the apps. While they provide useful insights into app behaviors, these studies only offer a relatively rough approximation of those behaviors due to their overly conservative nature (i.e., considering all possible app executions). Dynamic characterizations would be necessary to complement