Partition Aware Duplicate Records Detection (PADRD) Methodology in Big Data - Decision Support Systems Anusuya Kirubakaran 1( ✉ ) and Aramudhan Murugaiyan 2 1 Mother Teresa Women’s University, Kodaikanal, India Anusuya.kirubakaran@outlook.com 2 PKIET, Karaikal, Puducherry, India Aranagai@yahoo.co.in Abstract. As on today, the big data analytics and business intelligence (BI) decision support system (DSS) are the vital pillar of the leadership ability by translating raw data toward intelligence to make ‘right decision on right time’ and to share ‘right decision to right people’. Often DSS challenged to process the massive volume of data (Terabyte, petabyte, Exabyte, Zettabyte etc.) and to over‐ come the issues like data quality, scalability, storage and query performance. The failure in DSS was one of the reasons highlighted clearly by United State Senate report regarding the 2008 American economy collapse. To keep these issues in mind, this work explores a preventive methodology for “Data Quality - Dupli‐ cates” dimension with optimized query performance in big data era. In detail, BI team extracts and loads the historical operational structured data (Data Feed) to its repository from multiple sources periodically such as daily, weekly, monthly, quarterly, half yearly for analytics and reporting. During this load unpremeditated duplicate data feed insertion occurs due to lack of expertise, lack of history, missing integrity constraints which impact the intelligence reporting error ratio & the leader ship ability. So the necessity of unintentional data quality issue injection prevention arises. Over all, this paper proposes a methodology to “Improve the Data Accuracy” through detection of duplicate records between big data repository vs data feed before the data load with “Optimized Query Perform‐ ance” through partition aware search query generation and “Faster Data Block Address Search” through braided b + tree indexing. Keywords: Duplicate record detection · Braided tree indexing Decision support system 1 Introduction Today, most of the ﬁrms, governments & regulators across geographies understood and erudite from 2008 ﬁnancial crisis that the necessity of quality of data to be in place for eﬀective and wealthy macro-economic behaviors. Hence regulators are insisting the ﬁnancial sectors to manage the quality of data which are useful for identiﬁcation of risks across portfolios, proactive measure for business stabilization and challenges, © Springer Nature Singapore Pte Ltd. 2018 Shriram R and M. Sharma (Eds.): DaSAA 2017, CCIS 804, pp. 86–98, 2018. https://doi.org/10.1007/978-981-10-8603-8_8