Distributed Data Mining and Agents Josenildo C. da Silva , Chris Giannella , Ruchita Bhargava , Hillol Kargupta , and Matthias Klusch Department of Computer Science and Electrical Engineering University of Maryland Baltimore County, Baltimore, MD 21250 USA cgiannel,hillol @cs.umbc.edu German Research Center for Artificial Intelligence Stuhlsatzenweghaus 3, 66121 Saarbruecken, Germany jcsilva,klusch @dfki.de Microsoft Corporation One Microsoft Way Redmond, WA 98052 USA AGNIK LLC 8840 Stanford Blvd. Suite 1300 Columbia, Maryland 21045 USA Abstract. Multi-Agent Systems (MAS) offer an architecture for distributed prob- lem solving. Distributed Data Mining (DDM) algorithms focus on one class of such distributed problem solving tasks—analysis and modeling of distributed data. This paper offers a perspective on DDM algorithms in the context of multi- agents systems. It discusses broadly the connection between DDM and MAS. It provides a high-level survey of DDM, then focuses on distributed clustering algorithms and some potential applications in multi-agent-based problem solv- ing scenarios. It reviews algorithms for distributed clustering, including privacy- preserving ones. It describes challenges for clustering in sensor-network environ- ments, potential shortcomings of the current algorithms, and future work accord- ingly. It also discusses confidentiality (privacy preservation) and presents a new algorithm for privacy-preserving density-based clustering. Keywords: multi-agent systems, distributed data mining, clustering, privacy, sensor networks 1 Introduction Multi-agent systems (MAS) often deal with complex applications that require distributed problem solving. In many applications the individual and collective behavior of the agents depend on the observed data from distributed sources. In a typical distributed environment analyzing distributed data is a non-trivial problem because of many con- straints such as limited bandwidth (e.g. wireless networks), privacy-sensitive data, dis- tributed compute nodes, only to mention a few. The field of Distributed Data Mining (DDM) deals with these challenges in analyzing distributed data and offers many al- gorithmic solutions to perform different data analysis and mining operations in a fun- damentally distributed manner that pays careful attention to the resource constraints.