International Journal of Electrical and Computer Engineering (IJECE) Vol. 13, No. 1, February 2023, pp. 1008~1014 ISSN: 2088-8708, DOI: 10.11591/ijece.v13i1.pp1008-1014 1008 Journal homepage: http://ijece.iaescore.com Matching data detection for the integration system Merieme El Abassi 1 , Mohamed Amnai 1 , Ali Choukri 1 , Youssef Fakhri 1 , Noreddine Gherabi 2 1 Laboratory of Computer Sciences Research, Faculty of Sciences, Ibn Tofail University Kenitra, Kenitra, Morocco 2 National School of Applied Sciences of Sultan Moulay Slimane University, Khouribga, Morocco Article Info ABSTRACT Article history: Received Nov 9, 2021 Revised Sep 22, 2022 Accepted Oct 5, 2022 The purpose of data integration is to integrate the multiple sources of heterogeneous data available on the internet, such as text, image, and video. After this stage, the data becomes large. Therefore, it is necessary to analyze the data that can be used for the efficient execution of the query. However, we have problems with solving entities, so it is necessary to use different techniques to analyze and verify the data quality in order to obtain good data management. Then, when we have a single database, we call this mechanism deduplication. To solve the problems above, we propose in this article a method to calculate the similarity between the potential duplicate data. This solution is based on graphics technology to narrow the search field for similar features. Then, a composite mechanism is used to locate the most similar records in our database to improve the quality of the data to make good decisions from heterogeneous sources. Keywords: Data integration Data matching Data quality Entity resolution This is an open access article under the CC BY-SA license. Corresponding Author: Merieme El Abassi Laboratory of Computer Sciences Research, Faculty of Sciences, Ibn Tofail University Kenitra Kenitra, Morocco Email: merieme.elabassi@uit.ac.ma 1. INTRODUCTION Big data is like a small data but in a large amount of data with a higher complexity level, because it becomes very difficult to control it by any database management tool [1]. However, big data is characterized via a set of properties, including volume, veracity, variety, and velocity. Volume represents the size of data; it can be extended up to terabyte or more. Velocity denoted how fast the data came in. variety represents that data can be structured, semi-structured and unstructured format. Today, data has become the wealth of companies and management departments, contributing to its development. The decisions based on low-quality data can be very costly, hurting businesses, partners, and customers. Furthermore, the management departments and companies need to improve their relationships through data governance. Hence, having good data quality is very important for companies, especially when they interact with other organizations or make big decisions. The concentrate on the structure of the data to be cleaned or integrated, in order to make some metrics and ways to solve the issues of data quality. That is what the suggested methods depend on to solve these problems. Thus, to get helpful data, we need to analyze it within the range of its usage [2]. As we know, integration projects may require some support to improve the quality of data, because there are few companies who execute the procedures of data quality management in the database or data warehouse they have created. Currently, the problem of entity analysis is a field of research in the field of data quality [3]–[5]. Just as online mining for relationships and entities has established an extensive public knowledge base, companies, governments, and researchers can also use the true value of this data, which can only be used when multiple data sources are integrated. Entity resolution refers to the task of identifying records from the