Extended Star Clustering Algorithm Reynaldo J. Gil-Garc´ ıa 1 , Jos´ e M. Bad´ ıa-Contelles 2 , and Aurora Pons-Porrata 1 1 Universidad de Oriente, Santiago de Cuba, Cuba {gil,aurora}@app.uo.edu.cu 2 Universitat Jaume I, Castell´on, Spain badia@icc.uji.es Abstract. In this paper we propose the extended star clustering algo- rithm and compare it with the original star clustering algorithm. We introduce a new concept of star and as a consequence, we obtain dif- ferent star-shaped clusters. The evaluation experiments on TREC data, show that the proposed algorithm outperforms the original algorithm. Our algorithm is independent of the data order and obtains a smaller number of clusters. 1 Introduction Clustering algorithms are widely used for document classification, clustering of genes and proteins with similar functions, event detection and tracking on a stream of news, image segmentation and so on. For a good overview see [1,2]. Given a collection of n objects characterized by m features, clustering algorithms try to construct partitions or covers of this collection. The similarity among the objects in the same cluster should be maximum, whereas the similarity among objects in different clusters should be minimum. One of the most important problems in recent years is the enormous increase in the amount of unorganized data. Consider, for example, the web or the flow of news in newspapers. We need methods for organizing information in order to highlight the topic content of a collection, detect new topics and track them. The star clustering algorithm [3] was proposed for these tasks and three scalable extensions of this algorithm are presented in [4]. The star method outperforms existing clustering algorithms such as single link [5], average link [6] and k- means [7] in the organizing information task as it can be seen in [3]. However, the clusters obtained by this algorithm depend on the data order and it could obtain “illogical” clusters. In this paper we propose a new clustering method that solves some of its drawbacks. We define a new concept of star and as a consequence, we obtain different star-shaped clusters. Both algorithms were compared using TREC data and the experiments show that our algorithm outperforms the original star clus- tering algorithm. The rest of the paper is organized as follows. Section 2 describes the star clustering algorithm and shows its drawbacks. Section 3 describes the proposed algorithm and the experimental results are shown in Section 4. Finally, conclu- sions are presented in Section 5. A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 480–487, 2003. c Springer-Verlag Berlin Heidelberg 2003