Partitional Clustering Experiments with News Documents Arantza Casillas 1 , Mayte Gonz´alez de Lena 2 , and Raquel Mart´ ınez 2 1 Dpt. de Electricidad y Electr´onica, Facultad de Ciencias Universidad del Pa´ ıs Vasco arantza@we.lc.ehu.es 2 Escuela Superior de CC. Experimentales y Tecnolog´ ıa Universidad Rey Juan Carlos {mt.gonzalez,r.martinez}@escet.urjc.es Abstract. We have carried out experiments in clustering a news corpus. In these experiments we have used two partitional methods varying two diﬀerent parameters of the clustering tool. In addition, we have worked with the whole document (news) and with representative parts of the document. We have obtained good results working with a representative part of the document. The experiments have been carried out with news in Spanish and Basque in order to compare the results in both languages. 1 Introduction The document clustering deals with the problem of identifying sets of themati- cally related documents. Document clustering has been investigated for using in a number of diﬀerent areas: information retrieval, browsing collections of docu- ments, etc; and a number of techniques have been used [3]. We are investigating the use of clustering techniques for addressing the linking of news documents and we are working in two languages: Spanish and Basque. We have employed partitional methods in our experiments. With partitional methods the clusters generated contain objects that agree with a strong pattern. For example, their contents include some shared words or terms; in each cluster there are objects (news) that share a subset of the dimension space. In this paper we present the results of the experiments that we have carried out with two diﬀerent news corpus, one in Spanish and the other in Basque. In the next Section we brieﬂy describe the documents; Section 3 describe the used clustering tool, the type of parameters and the experiments; in Section 4 we present the results; ﬁnally, section 5 summarizes the conclusions drawn from the work carried out. 2 Documents Description In the project we are involved [4], we are working with a corpus of categorized news. The categories are the Industry Standard IPTC Subject Codes [2]. We have selected for the experiments the sport category in order to test the clustering