Predicting Popularity of Online Distributed Applications:
iTunes App Store Case Analysis
Miao Chen
School of Information Studies
Syracuse University
Mchen14@syr.edu
Xiaozong Liu
School of Information Studies
Syracuse University
Xliu12@syr.edu
ABSTRACT
Online distributed applications are becoming more and more
important for users nowadays. There are an increasing number of
individuals and companies developing applications and selling
them online. In the past couple of years, Apple Inc. has
successfully built an online application distribution platform –
iTunes App Store, which is facilitated by their fashionable
hardware such like iPad or iPhone. Unlike other traditional selling
networks, iTunes has some unique features to advertise their
application, for example, daily application ranking, application
recommendation, free trial application usage, application update,
and user comments. All of these make us wonder what makes an
application popular in the iTunes store and why users are
interested in some specific type of applications. We plan to
answer these questions by using machine learning techniques.
Keywords
Online distributed application, machine learning, data mining,
popularity
1. INTRODUCTION
In the past 20 years, users have begun to be used to and satisfied
with online shopping, and the success of major online shopping
systems, such as Amazon or eBay, shows that online shopping is
becoming a great competitor as well as complement to traditional
shopping outlets. Just until recently, online distributed
applications have been widely accepted by end users with the
success of iPhone and iPad. Apple Inc. launched iTunes service in
April 2003, and in the past years, iTunes Application Store has
become one of Apple’s most profitable services. By using this
platform, developers can decide the price of their application (as a
paid app) or make them free (as a free app). Until Jan 5 2010, 3
billion applications have been downloaded from iTunes. Similar
to other online distribution systems, user attention is often
distributed following power law (Nazir et al., 2008), with most
content getting only some downloads, whereas a few receiving the
most attention (Wu & Huberman, 2007).
For this paper, we investigate 102,337 applications from iTunes
App store. By tracking daily ranking, application properties, and
user rating and comments, we want to answer two research
questions: 1) What makes an application popular? 2) Can we
predict the popularity of new and existing applications? In order
to answer these questions, we will employ Classification And
Regression Tree (CART) (Breiman et al., 1984; Steinberg &
Colla, 1995) classification algorithm to analyze a list of
innovative numeric and textual features.
2. RELATED WORK
The popularity of objects such as software applications,
webpages, etc. has been measured in two ways: 1) network based
measurement, which measures one object’s popularity based on
its connection with other related objects, for instance webpage
popularity ranking PageRank (Page et al., 1998) and ontology
popularity (Sabou et al., 2006); 2) feedback based measurement,
which measures popularity based on user feedback such as voting,
ratings, comments, etc. (Cha et al., 2007). In our study we adopt
the second popularity measurement, taking application ranking
ranked by users as popularity.
There have been a few studies on iPhone applications with fewer
studies are on iPad applications due to its newness (it came out in
2010). A number of iPhone studies have been about using iPhone
applications for educational purposes, i.e. teaching fundamental
computer concepts by using iPhone games (O’Rourke et al.,
2010). Kim et al. (2010) studied factors that affect smartphone
application developers’ intention to develop applications
frequently, in the context of platform business, which enables
third-party developers to distribute and possibly make profits from
their applications. Our study will integrate features from different
perspectives, including features of application information and
features of user-contributed content. We will take into account
some innovative features that have not been used in previous
studies to the best of our knowledge, such as whether an
application has free version, application update information and
user comment sentiment analysis etc.
3. DATA
We collected data in two steps: First, 102,337 applications are
sampled from the iTunes application databases, with their
application name, provider, release date and category extracted (in
table 1); Second, everyday a list of dynamic features were
collected by tracking their daily ranking (top 200 paid and free
applications from general and categorical rank lists). A variety of
categories of iTunes applications were collected (shown in Table
1), and three groups of features were collected for our machine
learning purpose: static, dynamic, and comments features (shown
in Table 2).
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
iConference 2011, February 8-11, 2011, Seattle, WA, USA
Copyright © 2011 ACM 978-1-4503-0121-3/11/02…$10.00
661