Clustering Large Datasets of Mixed Units Simona Korenjak- ˇ Cerne, Vladimir Batagelj Institute of Mathematics, Physics and Mechanics, Dept. of TCS, and University of Ljubljana, Faculty of Mathematics and Physics Jadranska 19, 1 000 Ljubljana, Slovenia e-mail: simona.korenjak@fmf.uni-lj.si e-mail: vladimir.batagelj@uni-lj.si Summary: In this paper we propose an approach for clustering large datasets of mixed units, where variables (properties) of the units are measured in different scales (e.g. interval, ordinal, nominal). The uniform representation of the units is obtained from the partition of the variables ranges. The description of a cluster consists of the frequencies of the variable values over its range partition for each variable and as such represents an extension of the uniform representation of the units. The proposed representation can be used also for clustering symbolic data. On the basis of this representation the adapted version of the leaders method and adding clustering method were implemented. The proposed approach was success- fully applied on several large datasets. Keywords: large datasets, clustering, mixed units, hierarchical clustering, cluster description compatible with merging of clusters, leaders method, adding method. 1. Introduction When someone wants to get some information from large datasets one possible way is that he/she tries to find clusters in them. But most of the known hierarchical clustering methods are appropriate only for clustering datasets of a moderate size (some hundreds of units). On the other hand nonhierarchical methods are mostly implemented for datasets with variables measured in the same scale type (only nu- merical, only nominal, or only binary). Because of these limits we are searching for a new clustering methods or at least trying to adapt known methods to be appro- priate for clustering large datasets with mixed units, where variables (properties) of the units are measured in different scales. Let be a finite set of units. A nonempty subset is called a cluster. A set of clusters forms a clustering. In this paper we shall require that every clustering is a partition of . The clustering problem can be formulated as an optimization problem: Determine the clustering , for which where is a set of feasible clusterings and IR is a criterion function.