CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2003; 15:101–116 (DOI: 10.1002/cpe.711) Clustering revealed in high-resolution simulations and visualization of multi-resolution features in ﬂuid–particle models Krzysztof Boryczko 1 , Witold Dzwinel 1 and David A. Yuen 2, ∗,† 1 AGH Institute of Computer Science, al. Mickiewicza 30, 30-059, Krak´ ow, Poland 2 Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455-1227, U.S.A. SUMMARY Simulating natural phenomena at greater accuracy results in an explosive growth of data. Large-scale simulations with particles currently involve ensembles consisting of between 10 6 and 10 9 particles, which cover 10 5 –10 6 time steps. Thus, the data ﬁles produced in a single run can reach from tens of gigabytes to hundreds of terabytes. This data bank allows one to reconstruct the spatio-temporal evolution of both the particle system as a whole and each particle separately. Realistically, for one to look at a large data set at full resolution at all times is not possible and, in fact, not necessary. We have developed an agglomerative clustering technique, based on the concept of a mutual nearest neighbor (MNN). This procedure can be easily adapted for efﬁcient visualization of extremely large data sets from simulations with particles at various resolution levels. We present the parallel algorithm for MNN clustering and its timings on the IBM SP and SGI/Origin 3800 multiprocessor systems for up to 16 million ﬂuid particles. The high efﬁciency obtained is mainly due to the similarity in the algorithmic structure of MNN clustering and particle methods. We show various examples drawn from MNN applications in visualization and analysis of the order of a few hundred gigabytes of data from discrete particle simulations, using dissipative particle dynamics and ﬂuid particle models. Because data clustering is the ﬁrst step in this concept extraction procedure, we may employ this clustering procedure to many other ﬁelds such as data mining, earthquake events and stellar populations in nebula clusters. Copyright c  2003 John Wiley & Sons, Ltd. KEY WORDS: large-scale data sets; visualization; feature extraction; parallel clustering; dissipative particle dynamics; ﬂuid particle model ∗ Correspondence to: David A. Yuen, Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455- 1227, U.S.A. † E-mail: davey@krissy.msi.umn.edu Contract/grant sponsor: Complex Fluids Program of the U.S. Department of Energy Contract/grant sponsor: Energy Research Laboratory Technology Research Program of the Ofﬁce of Energy Research, U.S. Department of Energy under subcontract from the Paciﬁc Northwest National Laboratory Contract/grant sponsor: KBN (Polish Committee of Scientiﬁc Research); contract/grant number: 4 T11F 02022 Received 29 January 2002 Copyright c  2003 John Wiley & Sons, Ltd. Revised 14 June 2002