© 2022 IJRAR November 2022, Volume 9, Issue 4 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
IJRAR22D3221 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 44
Graph Neural Networks for Entity Resolution in
Cloud-Native Data Platforms
Srikanth Jonnakuti
Staff Software Engineer, Move Inc. operator of Realtor.com, Newscorp
ABSTRACT - The business landscapes, customer and
product information tend to be scattered across disparate
distributed data stores, leading to redundant, inconsistent, and
ambiguous records. This paper introduces a new method of
real-time entity resolution and deduplication employing
Graph Neural Networks (GNNs) as part of a Kubernetes-
based microservices architecture. The approach takes
advantage of the representational capability of GNNs to
capture the intricate relational patterns between records and
employs learned embeddings to detect, connect, and
deduplicate entities in siloed datasets. The system is
dynamically updatable and horizontally scalable and can be
well-suited for cloud-native use cases. The GNN is trained on
structured and semi-structured attributes and employs
supervised learning with labeled duplicate and unique record
pairs. Having learned, the model deduces probabilities of
links among incoming records and nodes in the current graph
and combines those recognized as duplicates. For real-time
resolution, the architecture leverages stateless microservices
running with Kubernetes, permitting elastic and robust
operations. Kafka streams provide low-latency ingestion of
data, while Redis and PostgreSQL support high-speed graph
lookups and data persistence. Large-scale experimentations
prove that our system is highly precise and recall-accurate
compared to conventional rule-based and ML-only
deduplication techniques. Additionally, the modular
microservices architecture enables smooth integration with
current enterprise workflows with minimal interruption. The
methodology is highly extensible to other varied domains,
such as finance, retail, and healthcare, where precise, real-
time data integrity becomes essential for downstream
analytics and decision-making. Our findings highlight the
potential of GNNs in entity relationship comprehension and
record resolution with minimal human involvement.
Keywords: Graph Neural Networks (GNNs), Entity
Resolution, Deduplication, Microservices, Kubernetes, Data
Integration, Real-time Processing, Distributed Data Stores,
Record Linking, Scalable AI.
I. INTRODUCTION
The information age, organizations are grappling with the
overwhelming challenge of maintaining consistent and up-to-
date customer and product information that is domiciled in
siloed systems. Such data silos typically occur because of
heterogeneous data ingestion pipelines, old systems, and non-
standard formats, which complicate processes such as
deduplication, linkage, and identity resolution. With
increasing numbers of firms embracing cloud-native
infrastructures fueled by Kubernetes-based microservices,
usage of complex AI models in the form of Graph Neural
Networks (GNNs) presents a robust solution for real-time
entity resolution. GNNs have been highly efficient to learn
from structured relations and high-feature data sets and are
hence optimally designed to model complex relations
between decomposed data entities [3] [5] [15] [16] [17].
These features enable systems to detect similar or duplicate
records from multiple sources programmatically, rendering
linkage accurate and evading operational inefficiencies [1]
[10]. Microservice environments based on Kubernetes offer
increased scalability, availability, and agility, allowing AI-
powered data resolution processes to be deployed and
managed efficiently [2] [6] [19] [21] [23]. By incorporating