A Study of Thread Level Parallelism on Mobile Devices
Cao Gao
*
, Anthony Gutierrez
*
, Ronald G. Dreslinski
*
, Trevor Mudge
*
, Krisztian Flautner
†
and Geoffery Blake
†
*
Advanced Computer Architecture Laboratory, University of Michigan, {caogao, atgutier, rdreslin, tnm}@umich.edu
†
ARM Ltd., {krisztian.flautner, blakeg}@arm.com
Abstract—Mobile devices continue to increase the number
of cores in an attempt to meet the needs of performance-
demanding applications. However, the increasing number of cores
does not necessarily translate into performance gain and/or power
reduction. In this paper we investigate how multi-core mobile
devices are utilized by applications. Our results demonstrate that
mobile applications are utilizing less than 2 cores on average,
which shows that multi-cores are generally underutilized by
today’s mobile applications. Unless application developers can
significantly improve core utilization, further increasing core
counts will result in little gain.
I. I NTRODUCTION
Given the growing hardware demand from modern mobile
applications, mobile devices vendors have started shipping
smartphones and tablets embedded with multi-core CPUs in
volume. However, despite the great computation potential that
resides in multi-core CPUs, it is not clear how much they
can be utilized for mobile devices. In order to take advantage
of a multi-core system, software developers have to divide
their program into parallelized threads, which is difficult. In a
similar desktop situation, Blake et al. [2] performed a study
on a suite of representative desktop applications. Their results
suggested that the number of cores that can be profitably used
are less than 3 for most commonly used applications.
In this work, we analyze a broad range of popular mo-
bile applications on two up-to-date development boards to
determine how the cores are utilized on mobile devices.
We calculate the Thread Level Parallelism (TLP) of these
applications. Our results show an average TLP of 1.4 for
a quad-core system. It suggests that mobile applications are
utilizing less than 2 cores on average, even with several
applications running concurrently. In fact, some recent mobile
CPUs [1] are made with 2 cores and still provide the desired
performance. We also measure the same metrics for a broad
spectrum of configurations, including various number of cores
in the system, core frequencies, and different CPUs. We
observe a modest TLP scalability for most applications, and
increasing the number of cores has little return on TLP. In
addition, CPUs with higher frequencies tend to exhibit less
TLP, which suggests that exploiting parallelism will only be
more challenging in the future. In all, these studies suggest
an underutilization of multi-core CPUs in mobile devices. It
seems that software developers are lagging behind in exploiting
parallelism in mobile applications, and increasing the number
of CPU cores may have diminishing returns until that changes.
II. METHODOLOGY
A. Metrics We use Thread Level Parallelism (TLP) [2],
[3], which is defined in Equation 1 as the machine utilization
over the non-idle portions of the benchmarks execution:
T LP =
∑
n
i=1
c
i
i
1 - c
0
(1)
where c
i
is the fraction of time that i cores are concurrently
running different threads, and n is the number of cores.
Specifically, c
0
is the idle time fraction. To calculate TLP,
we collect all the context switch events using ftrace, a Linux
kernel internal tracer.
B. System setup We choose two development boards that
are representative of the latest mobile device technology. Most
of the experiments are done on the Samsung Origenboard. It
contains a Exynos 4412 SoC with a 1.4GHz quad-core Cortex-
A9 CPU and Mali-400 GPU. For comparison, we also use a
Qualcomm Dragonboard with a 2.3GHz quad-core Krait CPU.
C. Benchmarks We choose 16 popular applications from
the Google Play Appstore and 4 native ones in the Android
OS. This means they have a large user base and are thus
representative of current mobile software. They represent ap-
plications from 10 different commonly used categories (shown
in Fig. 1a). The testing actions on these applications usually
last for 30 seconds and cover most typical functions of the
application under test. We found 30 seconds is long enough to
cover all common actions for the benchmark applications. All
experiments are repeated at least 5 times, and we observe a
low standard deviation of TLP results. Before testing, we kill
all the running and background applications to reduce experi-
mental errors. Besides single applications, we also choose four
applications from the suite, and run them concurrently with a
set of other applications in the background in order to simulate
multi-tasking scenarios.
III. RESULTS
In this section, we show that current mobile applications
have a rather low TLP on modern mobile device platforms.
We also observe a small return on TLP given the increase
in the number of cores and less TLP for cores with higher
frequencies.
We present the overall TLP results in Fig.1a. The results
demonstrate that: 1) All the applications have some, but
quite limited TLP. We do see a TLP higher than 1.2 for
almost all the applications under test. However, the parallelism
we observed is quite low: for a 4-core system, on average, we
see a TLP of 1.4. The applications with high TLP, namely
Games, Browser and Navigation, have TLPs around 1.5 to
1.6. Applications like Music and File Browser have rather low
TLPs around 1.2 to 1.3. 2) Increasing number of cores has
little return on TLP. On average, TLP increases by 4.5%
126 978-1-4799-3606-9/14/$31.00 ©2014 IEEE