Probing Knowledge in Distributed Data Mining Yike Guo and Janjao Sutiwaraphun Department of Computing, Imperial College 180 Queen’s Gate, London SW7 2BZ, UK {Y.Guo,J.Sutiwaraphun}@doc.ic.ac.uk Abstract. In this paper, we propose a new approach to apply meta- learning concept to distributed data mining. We name this approach Knowledge Probing where a supervised learning process is organised into two learning stages. In the ﬁrst learning phase, a set of base classiﬁers are learned in parallel from a distributed data set. In the second learning phase, meta-learning is applied to induce the relationship between an attribute vector and the class predictions from all the base classiﬁers. By applying this approach to an environment where base classiﬁers are pro- duced from distributed data sources, the output of Knowledge Probing process can be viewed as the assimilated knowledge of that distributed learning system. Some initial experimental results on the quality of the assimilated knowledge are presented. We believe that an integration of Knowledge Probing technique and the available data mining algorithms can provide a practical framework for distributed data mining applica- tions. Keywords: Distributed data mining, Committee Learning, Classiﬁca- tion data mining. 1 Introduction The vast quantities of commercial and scientiﬁc data being stored worldwide currently are increasingly being seen as the source of hidden knowledge. In the past decade a signiﬁcant amount of researches in the ﬁeld of data mining have been done, resulting in a variety of algorithms and techniques for automatically extracting this hidden information from data. However, there are some important challenges in using data mining technologies to real world applications: – data can be large: the execution time of the learning processes can be pro- hibitive when applying the algorithms to volumes of data generated in real world applications – data can be distributed: data can be physically distributed at remote sites Distributed data mining provides a promising solution to these challenges. The idea is to use data mining algorithms to extract knowledge from several (normally disjointed) distributed data sets and then use the knowledge from these individual learned models to create a uniﬁed body of knowledge that well N. Zhong and L. Zhou (Eds.): PAKDD’99, LNAI 1574, pp. 443–452, 1999. c  Springer-Verlag Berlin Heidelberg 1999