Poster Abstract: ECSRL: A Learning-Based Scheduling Framework for AI Workloads in Heterogeneous Edge-Cloud Systems Changyao Lin Harbin Institute of Technology Harbin, China 20S003095@stu.hit.edu.cn Ziyang Zhang Harbin Institute of Technology Harbin, China 20B903026@stu.hit.edu.cn Huan Li Harbin Institute of Technology (Shenzhen) Shenzhen, China huanli@hit.edu.cn Jie Liu Harbin Institute of Technology (Shenzhen) Shenzhen, China jieliu@hit.edu.cn ABSTRACT Recent advances in both lightweight models and edge computing make it possible for inference tasks to be executed concurrently on resource-constrained edge devices. However, our preliminary ex- periments show that the execution of diferent lightweight models on edge devices may lead to a performance downgrade. In this pa- per, we propose a Learning-Based Scheduling FrameworkÐECSRL, to optimize the latency and power consumption for those inference tasks running in heterogeneous Edge-Cloud systems. CCS CONCEPTS · Computing methodologies Planning and scheduling. KEYWORDS Heterogeneous Edge Computing; Task Scheduling; Reinforcement Learning ACM Reference Format: Changyao Lin, Ziyang Zhang, Huan Li, and Jie Liu. 2021. Poster Abstract: ECSRL: A Learning-Based Scheduling Framework for AI Workloads in Heterogeneous Edge-Cloud Systems. In Proceedings of The 19th ACM In- ternational Conference on Embedded Networked Sensor Systems (SenSys), Nov 15-17, 2021, Coimbra, Portugal. ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/3485730.3492886 1 INTRODUCTION As a new computing paradigm, edge AI [1] and mobile edge com- puting (MEC) [2] sink computing power from the cloud to edge AI devices, making it possible to inference Deep Learning (DL) work- load in real-time on the edge. Compared with cloud computing cluster, edge devices have the advantages of low latency, low power consumption, low price, and easy deployment, etc. The advantages and disadvantages of both are summarized in Table 1 below. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). SenSys ’21, November 15–17, 2021, Coimbra, Portugal © 2021 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-9097-2/21/11. . . $15.00 https://doi.org/10.1145/3485730.3492886 Table 1: Edge Device versus Cloud Server Paradigm Delay Accuracy Power Price Edge + Light Model Low Low Low Low Cloud + Large Model High High High High In order to understand the performance when multiple DL-based workloads run concurrently on an edge device, we set up a test- bed and found that the completion time of newly arrived AI task will be afected under concurrent situation. Also, as mentioned in the reference, most of the existing cluster scheduling algorithms are either based on simulation or homogeneous virtual machine clusters [3]. Therefore, it is essential to rethink to design efcient task scheduling algorithms for edge clusters with heterogeneous devices. In this paper, we propose to adopt reinforcement learning strat- egy to design a task scheduling framework for heterogeneous Edge- Cloud systems. The objective is to minimize the power consumption and completion time. Meanwhile, if the lightweight DL workload deployed on the edge device could not meet the accuracy require- ment, the system will automatically ofoad the task to the cloud to run a high-precision model for a better inference result. 2 EXPERIMENTAL STUDY To measure the performance of concurrent workloads execution at edge devices, in our test-bed, two lightweight models, ResNet-18 and SSD-MobileNet-v2 are deployed on NVIDIA Xaiver NX. Figure 1 shows the performance matrix for ResNet-18 and SSD- MobileNet-v2 on edge device NX respectively. Here, m denotes the number of SSD-MobileNet-v2, r denotes the number of ResNet-18. The elements in matrix represent the ratio of the running time of the added workload concurrently with other existing workloads divided by the time when the system only executes a single such workload. For example, in Figure 1(a), the value of [1][0] represents the ratio of the inference time when ResNet-18 is deployed on NX that has already runs 1 SSD-MobileNet-v2 and 0 ResNet-18 over the inference time when ResNet-18 is deployed on the device NX alone. Since the time to complete a single inference task on a specifc device is a fxed value, in this fgure, the larger values indicate the performance degradation (longer completion time) as more tasks added into the system. 386