Overview A survey on mining multiple data sources T. Ramkumar, 1∗ S. Hariharan 2 and S. Selvamuthukumaran 1 Advancements in computer and communication technologies demand new per- ceptions of distributed computing environments and development of distributed data sources for storing voluminous amount of data. In such circumstances, min- ing multiple data sources for extracting useful patterns of significance is being considered as a challenging task within the data mining community. The domain, multi-database mining (MDM) is regarded as a promising research area as ev- idenced by numerous research attempts in the recent past. The methods exist for discovering knowledge from multiple data sources, they fall into two wide categories, namely (1) mono-database mining and (2) local pattern analysis. The main intent of the survey is to explain the idea behind those approaches and con- solidate the research contributions along with their significance and limitations. C 2012 Wiley Periodicals, Inc. How to cite this article: WIREs Data Mining Knowl Discov 2013, 3: 1–11 doi: 10.1002/widm.1077 INTRODUCTION R apid strides made in the communication tech- nology over wired and wireless networks re- sult in the development of various distributed appli- cations. A distributed application might have data sources, which are scattered over various geographi- cal locations for handling huge volume of data. This scenario allows organizations for promoting multi- database applications toward fulfilling their opera- tional needs. Thus many organizations need to mine their multi-databases distributed at branches for the purpose of decision-making. Consider a retail store Reliance India Ltd, which has launched a retail rev- olution in India—from no stores to 1500 outlets in just six months. Each of these outlets produces huge number of transactions on a daily basis. Developing an effective data mining technique to discover pat- terns from multiple branches thus become crucial one for these types of applications. The domain, multi-database mining (MDM) gains significant attention because of (1) Increasing use of automatic data collection tools and flood of ∗ Correspondence to: ramooad@yahoo.com 1 Department of Computer Applications, A.V.C. College of Engi- neering, Tamil Nadu, India 2 Department of Computer science and Engineering, TRP Engineer- ing College, Tamil Nadu, India DOI: 10.1002/widm.1077 data generated in the operational process of an orga- nization; (2) changing nature of distributed reposito- ries with different data sources and formats; (3) orga- nization’s imperative needs for analyzing the contents and trends of branch databases; and (4) need to en- hance the effectiveness of decision-making process by the way of incorporating quality knowledge extracted from multi-databases. The success of MDM application largely de- pends on the data available in multiple data bases. In real-world application, data stored in multiple places are often inconsistent and conflict with each other. Bright et al. 1 discussed the following data representa- tion issues in multi-database environment. (1) Name differences: Databases may have different conven- tions for the naming of objects, leading to problems with synonyms and homonyms. A synonym means that the same data item has a different name in differ- ent databases. The global system must recognize the semantic equivalence of the items and map the differ- ing local names to a single global name. A homonym means that different data items have the same name in different databases. The global system must rec- ognize the semantic difference between items and map the common names to different global names. (2) Format differences: Format differences include dif- ferences in the data type, domain, scale, precision, and item combinations. As an example, we can cite the case, where a part number is defined as an integer Volume 3, January/February 2013 1 c 2012 John Wiley & Sons, Inc.