© 2022 IJRAR November 2022, Volume 9, Issue 4 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138) IJRAR22D3221 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 44 Graph Neural Networks for Entity Resolution in Cloud-Native Data Platforms Srikanth Jonnakuti Staff Software Engineer, Move Inc. operator of Realtor.com, Newscorp ABSTRACT - The business landscapes, customer and product information tend to be scattered across disparate distributed data stores, leading to redundant, inconsistent, and ambiguous records. This paper introduces a new method of real-time entity resolution and deduplication employing Graph Neural Networks (GNNs) as part of a Kubernetes- based microservices architecture. The approach takes advantage of the representational capability of GNNs to capture the intricate relational patterns between records and employs learned embeddings to detect, connect, and deduplicate entities in siloed datasets. The system is dynamically updatable and horizontally scalable and can be well-suited for cloud-native use cases. The GNN is trained on structured and semi-structured attributes and employs supervised learning with labeled duplicate and unique record pairs. Having learned, the model deduces probabilities of links among incoming records and nodes in the current graph and combines those recognized as duplicates. For real-time resolution, the architecture leverages stateless microservices running with Kubernetes, permitting elastic and robust operations. Kafka streams provide low-latency ingestion of data, while Redis and PostgreSQL support high-speed graph lookups and data persistence. Large-scale experimentations prove that our system is highly precise and recall-accurate compared to conventional rule-based and ML-only deduplication techniques. Additionally, the modular microservices architecture enables smooth integration with current enterprise workflows with minimal interruption. The methodology is highly extensible to other varied domains, such as finance, retail, and healthcare, where precise, real- time data integrity becomes essential for downstream analytics and decision-making. Our findings highlight the potential of GNNs in entity relationship comprehension and record resolution with minimal human involvement. Keywords: Graph Neural Networks (GNNs), Entity Resolution, Deduplication, Microservices, Kubernetes, Data Integration, Real-time Processing, Distributed Data Stores, Record Linking, Scalable AI. I. INTRODUCTION The information age, organizations are grappling with the overwhelming challenge of maintaining consistent and up-to- date customer and product information that is domiciled in siloed systems. Such data silos typically occur because of heterogeneous data ingestion pipelines, old systems, and non- standard formats, which complicate processes such as deduplication, linkage, and identity resolution. With increasing numbers of firms embracing cloud-native infrastructures fueled by Kubernetes-based microservices, usage of complex AI models in the form of Graph Neural Networks (GNNs) presents a robust solution for real-time entity resolution. GNNs have been highly efficient to learn from structured relations and high-feature data sets and are hence optimally designed to model complex relations between decomposed data entities [3] [5] [15] [16] [17]. These features enable systems to detect similar or duplicate records from multiple sources programmatically, rendering linkage accurate and evading operational inefficiencies [1] [10]. Microservice environments based on Kubernetes offer increased scalability, availability, and agility, allowing AI- powered data resolution processes to be deployed and managed efficiently [2] [6] [19] [21] [23]. By incorporating