Data-Driven Oracle Bone Rejoining: A Dataset and Practical Self-Supervised Learning Scheme Chongsheng Zhang Henan University Kaifeng, China cszhang@ieee.org Bin Wang Henan University Kaifeng, China bin.wang@henu.edu.cn Ke Chen South China University of Technology Guangzhou, China chenk@scut.edu.cn Ruixing Zong Henan University Kaifeng, China rxzong@henu.edu.cn Bo-feng Mo Capital Normal University Beijing, China mbf2001@163.com Yi Men Henan University Kaifeng, China yi.men@henu.edu.cn George Almpanidis Henan University Kaifeng, China almpanidis@acm.org Shanxiong Chen Southwest University Chongqing, China csxpml@163.com Xiangliang Zhang University of Notre Dame Notre Dame, USA xzhang33@nd.edu ABSTRACT Oracle Bone Inscriptions (OBI) is one of the oldest scripts in the world. The rejoining of Oracle Bone (OB) fragments is of vital im- portance to the research of ancient scripts and history. Although signifcant progress has been achieved in the past decades, the re- joining work still heavily relies on domain knowledge and manual work, thus remains a low efcient and time-consuming process. Therefore, an automatic and practical algorithm/system for OB rejoining is of great value to the OBI community. To this end, we collect a real-world dataset for rejoining Oracle Bone fragments, namely OB-Rejoin, which consists of 998 OB rubbing images that suf- fer from low quality image problems, due to intrinsic underground eroding over time and extrinsic imaging conditions in the past. Moreover, a practical Self-Supervised Splicing Network, S 3 -Net, is proposed to rejoin the OB fragments based on shape similarity of their borderlines. Specifcally, we frst transform the manually anno- tated borderline strokes of OB images into times series style shape representations, which are fed as input to a Generative Adversarial Network for augmenting positive pairs of rejoinable OBs for each OB fragment that does not have rejoinable counterparts. A Siamese network is trained on such augmented data in a contrastive learn- ing manner to retrieve the matching OB fragments of an unseen query from an OB fragment gallery. Experiments on the OB-Rejoin benchmark show that our data-driven approach outperforms two recent methods for time-series analysis. In order to demonstrate its practical potential, we deploy the proposed S 3 -Net method in real indicates corresponding author. Also afliated with Peng Cheng Laboratory. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. KDD ’22, August 14ś18, 2022, Washington, DC, USA. © 2022 Association for Computing Machinery. ACM ISBN 978-1-4503-9385-0/22/08. . . $15.00 https://doi.org/10.1145/3534678.3539050 tests and ultimately discover dozens of new rejoinings missed by domain experts for decades. CCS CONCEPTS · Information systems Similarity measures; · Computing methodologies Shape representations; Matching; Neural networks. KEYWORDS Oracle Bone Rejoining, Data Augmentation, Contrastive Learning ACM Reference Format: Chongsheng Zhang, Bin Wang, Ke Chen , Ruixing Zong, Bo-feng Mo, Yi Men, George Almpanidis, Shanxiong Chen, and Xiangliang Zhang. 2022. Data-Driven Oracle Bone Rejoining: A Dataset and Practical Self-Supervised Learning Scheme. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22), August 14ś18, 2022, Wash- ington, DC, USA. ACM, New York, NY, USA, 11 pages. https://doi.org/10. 1145/3534678.3539050 1 INTRODUCTION Written language is the main carrier of human history and civiliza- tion for thousands of years. Oracle Bone Inscriptions (OBI), which was used in the Shang dynasty more than 3,600 years ago, is one of the oldest writing systems in the world. It was frst discovered in the year of 1899. Until now, there are in total over 160,000 pieces of unearthed Oracle Bones (OB), and new Oracle Bones are being continuously excavated. OBs were used by the religious specialists (shamans) at that time for practicing a specifc form of divination to foretell the future based on the cracks in the animal bones and turtle shells (carved with OBIs) after the bones were burned. OBI research is very important for both history and literature. Due to historical reasons and traditions in OBI research, the main presen- tation form of OBI materials is rubbing, as colored OB images are rare, expensive and vastly unavailable, because Oracle Bones are protected as antiquities in diferent museums and organizations all over the world. 4482