Predicting Popularity of Online Distributed Applications: iTunes App Store Case Analysis Miao Chen School of Information Studies Syracuse University Mchen14@syr.edu Xiaozong Liu School of Information Studies Syracuse University Xliu12@syr.edu ABSTRACT Online distributed applications are becoming more and more important for users nowadays. There are an increasing number of individuals and companies developing applications and selling them online. In the past couple of years, Apple Inc. has successfully built an online application distribution platform – iTunes App Store, which is facilitated by their fashionable hardware such like iPad or iPhone. Unlike other traditional selling networks, iTunes has some unique features to advertise their application, for example, daily application ranking, application recommendation, free trial application usage, application update, and user comments. All of these make us wonder what makes an application popular in the iTunes store and why users are interested in some specific type of applications. We plan to answer these questions by using machine learning techniques. Keywords Online distributed application, machine learning, data mining, popularity 1. INTRODUCTION In the past 20 years, users have begun to be used to and satisfied with online shopping, and the success of major online shopping systems, such as Amazon or eBay, shows that online shopping is becoming a great competitor as well as complement to traditional shopping outlets. Just until recently, online distributed applications have been widely accepted by end users with the success of iPhone and iPad. Apple Inc. launched iTunes service in April 2003, and in the past years, iTunes Application Store has become one of Apple’s most profitable services. By using this platform, developers can decide the price of their application (as a paid app) or make them free (as a free app). Until Jan 5 2010, 3 billion applications have been downloaded from iTunes. Similar to other online distribution systems, user attention is often distributed following power law (Nazir et al., 2008), with most content getting only some downloads, whereas a few receiving the most attention (Wu & Huberman, 2007). For this paper, we investigate 102,337 applications from iTunes App store. By tracking daily ranking, application properties, and user rating and comments, we want to answer two research questions: 1) What makes an application popular? 2) Can we predict the popularity of new and existing applications? In order to answer these questions, we will employ Classification And Regression Tree (CART) (Breiman et al., 1984; Steinberg & Colla, 1995) classification algorithm to analyze a list of innovative numeric and textual features. 2. RELATED WORK The popularity of objects such as software applications, webpages, etc. has been measured in two ways: 1) network based measurement, which measures one object’s popularity based on its connection with other related objects, for instance webpage popularity ranking PageRank (Page et al., 1998) and ontology popularity (Sabou et al., 2006); 2) feedback based measurement, which measures popularity based on user feedback such as voting, ratings, comments, etc. (Cha et al., 2007). In our study we adopt the second popularity measurement, taking application ranking ranked by users as popularity. There have been a few studies on iPhone applications with fewer studies are on iPad applications due to its newness (it came out in 2010). A number of iPhone studies have been about using iPhone applications for educational purposes, i.e. teaching fundamental computer concepts by using iPhone games (O’Rourke et al., 2010). Kim et al. (2010) studied factors that affect smartphone application developers’ intention to develop applications frequently, in the context of platform business, which enables third-party developers to distribute and possibly make profits from their applications. Our study will integrate features from different perspectives, including features of application information and features of user-contributed content. We will take into account some innovative features that have not been used in previous studies to the best of our knowledge, such as whether an application has free version, application update information and user comment sentiment analysis etc. 3. DATA We collected data in two steps: First, 102,337 applications are sampled from the iTunes application databases, with their application name, provider, release date and category extracted (in table 1); Second, everyday a list of dynamic features were collected by tracking their daily ranking (top 200 paid and free applications from general and categorical rank lists). A variety of categories of iTunes applications were collected (shown in Table 1), and three groups of features were collected for our machine learning purpose: static, dynamic, and comments features (shown in Table 2). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. iConference 2011, February 8-11, 2011, Seattle, WA, USA Copyright © 2011 ACM 978-1-4503-0121-3/11/02…$10.00 661