INDENT: I ncremental On line De cision Tree T raining for
Domain-Specific Systems-on-Chip
Anish Krishnakumar
anish.n.krishnakumar@wisc.edu
University of Wisconsin-Madison
USA
Radu Marculescu
radum@utexas.edu
The University of Texas at Austin
USA
Umit Ogras
uogras@wisc.edu
University of Wisconsin-Madison
USA
ABSTRACT
The performance and energy efciency potential of heterogeneous
architectures has fueled domain-specifc systems-on-chip (DSSoCs)
that integrate general-purpose and domain-specialized hardware
accelerators. Decision trees (DTs) perform high-quality, low-latency
task scheduling to utilize the massive parallelism and heterogeneity
in DSSoCs efectively. However, ofine trained DT scheduling poli-
cies can quickly become inefective when applications or hardware
confgurations change. There is a critical need for runtime tech-
niques to train DTs incrementally without sacrifcing accuracy since
current training approaches have large memory and computational
power requirements. To address this need, we propose INDENT, an
incremental online DT framework to update the scheduling policy
and adapt it to unseen scenarios. INDENT updates DT schedulers
at runtime using only 1-8% of the original training data embedded
during training. Thorough evaluations with hardware platforms and
DSSoC simulators demonstrate that INDENT performs within 5% of
a DT trained from scratch using the entire dataset and outperforms
current state-of-the-art approaches.
CCS CONCEPTS
• Computer systems organization → System on a chip.
KEYWORDS
Domain-specifc system-on-chip, online learning, incremental train-
ing, decision trees, task scheduling, resource management, low-
power, ultra-low latency.
ACM Reference Format:
Anish Krishnakumar, Radu Marculescu, and Umit Ogras. 2022. INDENT:
I ncremental On line De cision Tree T raining for Domain-Specifc Systems-
on-Chip. In IEEE/ACM International Conference on Computer-Aided Design
(ICCAD ’22), October 30-November 3, 2022, San Diego, CA, USA. ACM, New
York, NY, USA, 9 pages. https://doi.org/10.1145/3508352.3549436
1 INTRODUCTION
With the slowdown of Moore’s law and Dennard scaling, heteroge-
neous processing elements (PEs) have been the primary catalyst for
the performance and energy efciency of computing systems [38].
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from permissions@acm.org.
ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9217-4/22/10. . . $15.00
https://doi.org/10.1145/3508352.3549436
For example, highly optimized fxed-function hardware accelerators
for signal processing and deep learning are commonly used in com-
munication and autonomous driving applications [3, 18]. However,
performance and energy efciency boosts come at the expense of
programming flexibility, as the hardware accelerators are notori-
ously hard to program. To address this challenge, domain-specifc
systems-on-chip rise as a new class of heterogeneous SoCs [1, 23].
They combine the flexibility of general-purpose cores with the
performance and energy efciency of specialized hardware acceler-
ators tailored to applications in a target domain [4, 11, 14]. DSSoCs
comprise many heterogeneous processing elements, resulting in an
ample runtime decision space for task execution. Hence, scheduling
algorithms try to identify the most appropriate execution resource
to maximize a specifc optimization objective, such as performance,
power consumption, or energy-delay product [18, 22, 26, 27, 37, 39].
DSSoCs can execute tasks in the order of nanoseconds due to
highly specialized hardware accelerators. Hence, task scheduling
algorithms must provide high-quality scheduling decisions at ultra-
low latencies [10, 11]. Decision tree (in short, DT) classifers ofer
a promising solution since they provide high-quality decisions at
low inference latency compared to multi-layer perceptron and deep
neural networks. Furthermore, DT policies are simple and easy to
interpret [19, 21, 34]. Task scheduling policies designed ofine are
optimized for a particular optimization objective, SoC confguration,
and set of applications [16, 18, 39]. Therefore, rapidly evolving
SoC architectures, emerging applications, and workloads pose a
severe risk to fxed scheduling policies. As these parameters change
over time, the ofine designed static policies become inefective,
lowering the energy efciency potentials of DSSoCs. Hence, there
is a critical need for task scheduling policies to adapt to dynamic
changes to maximize performance and energy efciency.
Existing DT design techniques require the entire dataset to train
a new standalone DT [5]. This requirement poses a signifcant
drawback compared to other ML models, such as neural networks,
since storing all training samples would require signifcant memory
on the target platform. Hence, the classical DT training algorithms
are impractical for online adaptation. Prior studies tried to address
this challenge using reinforcement learning (RL), ensemble trees,
and very fast decision trees (e.g., Hoefding trees) [9, 15, 25, 33]. RL
techniques sufer from computational power requirements for train-
ing [40]; DT ensembles result in higher latency and computational
overheads due to several weak learners [13]; fnally, the assump-
tions to train Hoefding trees do not hold for online updates [25].
Hence, existing techniques are not applicable for incremental and
online DT updates due to the resource constraints and inference
latency targets for DSSoCs.