SCIENCE CHINA Information Sciences March 2022, Vol. 65 132101:1–132101:17 https://doi.org/10.1007/s11432-019-2833-1 c  Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2021 info.scichina.com link.springer.com . RESEARCH PAPER . A self-tuning client-side metadata prefetching scheme for wide area network ﬁle systems Bing WEI 1,2 , Limin XIAO 1,2* , Yao SONG 1,2 , Guangjun QIN 3 , Jinbin ZHU 1,2 , Baicheng YAN 1,2 , Chaobo WANG 1,2 & Zhisheng HUO 1,2 1 Laboratory of Software Development Environment, Beihang University, Beijing 100191, China; 2 School of Computer Science and Engineering, Beihang University, Beijing 100191, China; 3 Smart City College, Beijing Union University, Beijing 100101, China Received 28 September 2019/Revised 14 December 2019/Accepted 17 March 2020/Published online 22 February 2021 Abstract Client-side metadata prefetching is commonly used in wide area network (WAN) ﬁle systems because it can eﬀectively hide network latency. However, most existing prefetching approaches do not meet the various prefetching requirements of multiple workloads. They are usually optimized for only one speciﬁc workload and have no or harmful eﬀects on other workloads. In this paper, we present a new self-tuning client-side metadata prefetching scheme that uses two diﬀerent prefetching strategies and dynamically adapts to workload changes. It uses a directory-directed prefetching strategy to prefetch the related ﬁle metadata in the same directory, and a correlation-directed prefetching strategy to prefetch the related ﬁle metadata accessed across directories. A novel self-tuning mechanism is proposed to eﬃciently convert the prefetching strategy between directory-directed and correlation-directed prefetching. Experimental results using real system traces show that the hit ratio of the client-side cache can be signiﬁcantly improved by our self-tuning client-side prefetching. With regards to the multi-workload concurrency scenario, our approach improves the hit ratios for the no-prefetching, directory-directed prefetching, variant probability graph algorithm, variant apriori algorithm, and variant semantic distance algorithm by up to 15.22%, 6.32%, 10.08%, 11.65%, and 10.73%, corresponding to 25.24%, 18.11%, 23.53%, 24.94%, and 24.19% reductions in the average access time, respectively. Keywords wide area network ﬁle systems, multiple workloads, metadata prefetching, correlation-directed prefetching, directory-directed prefetching, self-tuning prefetching Citation Wei B, Xiao L M, Song Y, et al. A self-tuning client-side metadata prefetching scheme for wide area network ﬁle systems. Sci China Inf Sci, 2022, 65(3): 132101, https://doi.org/10.1007/s11432-019-2833-1 1 Introduction In a wide area environment, heterogeneous storage resources owned by diﬀerent organizations are geo- graphically distributed, resulting in barriers between applications and data. Network-based ﬁle systems oﬀer promising solutions to address this problem. In network-based ﬁle systems (such as Onedata [1] and GFFS [2]), the client and server are decoupled and interact with each other through network commu- nications. Several network-based ﬁle systems use client-side metadata caching to reduce the number of network communications and achieve better access performance [1–5]. The client caches a certain amount of metadata and periodically refreshes the cached metadata [1]. As cache hit ratios are crucial for the performance of network-based ﬁle systems [6], several prefetching schemes [3–5, 7–11] have been proposed to improve cache hit ratios. These approaches can generally be classiﬁed into two categories: directory-directed or correlation-directed prefetching. Directory-directed prefetching is commonly used to alleviate access latency in several network-based storage systems [3– 5]. It prefetches all ﬁle metadata in the same directory with a network communication. This type of approach can be used to prefetch metadata without knowing the semantic correlations between ﬁles [11]. Directory-directed prefetching is eﬀective because it can capture the natural organization imposed by * Corresponding author (email: xiaolm@buaa.edu.cn)