IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1 Mobility-Aware Proactive Edge Caching for Connected Vehicles using Federated Learning Zhengxin Yu, Jia Hu, Geyong Min, Zhiwei Zhao, Wang Miao, M. Shamim Hossain Abstract—Content Caching at the edge of vehicular networks has been considered as a promising technology to satisfy the in- creasing demands of computation-intensive and latency-sensitive vehicular applications for intelligent transportation. The existing content caching schemes, when used in vehicular networks, face two distinct challenges: 1) Vehicles connected to an edge server keep moving, making the content popularity varying and hard to predict. 2) Cached content is easily out-of-date since each connected vehicle stays in the area of an edge server for a short duration. To address these challenges, we propose a Mobility-aware Proactive edge Caching scheme based on Federated learning (MPCF). This new scheme enables multiple vehicles to collaboratively learn a global model for predicting content popularity with the private training data distributed on local vehicles. MPCF also employs a Context-aware Adversarial AutoEncoder to predict the highly dynamic content popularity. Besides, MPCF integrates a mobility-aware cache replacement policy, which allows the network edges to add/evict contents in response to the mobility patterns and preferences of vehicles. MPCF can greatly improve cache performance, effectively protect users’ privacy and significantly reduce communication costs. Experimental results demonstrate that MPCF outperforms other baseline caching schemes in terms of the cache hit ratio in vehicular edge networks. Index Terms—Content Caching, Edge Computing, Federated Learning, Deep Learning, Vehicular Networks I. I NTRODUCTION With the advancement in wireless communications and Internet-of-Things (IoT), self-driving has been considered as a key enabling technology in Intelligent Transportation Systems (ITS) to decrease traffic congestion, improve traffic efficiency and enhance road safety [1]. Self-driving vehicles enable a wide range of applications, from infotainment applications to safety-related applications [2]. These applications may require large computation, communication and storage resources, and have strict performance requirements on network bandwidth and response time. Thus, supporting these applications im- poses high pressure on the resource-constrained Vehicular Net- works (VNs). Vehicular Edge Computing (VEC) is recognised as a promising paradigm to satisfy the increasing demands by integrating edge computing into VNs [2]. VEC allows data to Z. Yu, J. Hu, G. Min, W. Miao are with the Department of Computer Science, College of Engineering Mathematics and Physical Sciences, Uni- versity of Exeter, Exeter, EX4 4QF, U.K. Email: {zy246, J.Hu, G.Min, Wang.Miao}@exeter.ac.uk Z. Zhao is with the School of Computer Science and Engineering, University of Electronic Science and Technology of China, China. Email: zzw@uestc.edu.cn M. S. Hossain is with the Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia. E-mail: mshossain@ksu.edu.sa Corresponding authors: Jia Hu and Geyong Min be processed and stored at edge nodes, such as Roadside Units (RSUs) and Base Stations (BSs). Caching content at edge nodes enables vehicles to fetch their requested contents within one transmission hop [3]. It is capable of reducing service latency and alleviating backhaul network burden. Due to the limited storage at edge nodes, the caching schemes need to identify and cache the popular contents that are interesting to most vehicular users. Caching schemes can be classified into two categories: reactive caching and proactive caching. Reactive caching utilises the observed users’ request pattern to choose contents to be cached [4], such as First-In-First-Out (FIFO), Most Recently Used (MRU), and Least Recently Used (LRU). In reactive caching, contents may only be cached after being requested. Thus, if a content has not been requested before, there is no cached copy of this content. However, the high mobility of vehicles and complex vehicular environments cause highly dynamic content popularity. In this case, the previously requested contents may become obsolete soon, so the reactive caching scheme cannot satisfy strict performance requirements of users. In contrast, proactive caching predicts content popularity and caches pre- dicted popular contents before the arrival of user requests. It can pre-fetch the popular contents, even these contents may have never been requested before. Thus, proactive caching is considered to be more suitable for the VEC scenarios. In proactive caching, Machine Learning (ML) is a powerful approach to predict content popularity for efficient caching. Some works focus on learning-based caching schemes in VNs by utilising reinforcement learning [5], [6], multilayer perceptron and convolutional neural networks [7], etc. Although some progresses have been achieved in learning- based proactive caching, utilising ML techniques for edge caching in VNs still faces the following three challenges: 1) High mobility: Vehicles send requests to an RSU and go through its coverage area quickly, making the caching content easily to be out of date. To improve the cache performance, the caching scheme should be both context and mobility aware, making cache decisions based on the content popularity predictions and vehicles’ mobility. 2) Privacy: Most ML algorithms train models in a centralised manner where the data generated by multiple vehicles must be sent to an edge server for analysis. These generated data may involve personal sensitive information used for various vehicular applications. Therefore, uploading and processing these data centrally may raise privacy and security concerns. 3) Scalability: As the number of connected vehicles grows, data generated by the vehicles increase. The centralised ML algorithms may find it difficult to handle such data due to the incurred high