1. Introduction Clustering is an extremely powerful tool used for identifying patterns and grouping in datasets, based on the similarity between elements (Murty et. al., 1999). It is considered an unsupervised process (Charu and Chandan, 2013), since there is no predefined structure of the data. Clustering is applicable in many domains, ranging from biology and medicine to finance and marketing. It is used in fields such as data mining, pattern recognition, information retrieval, image analysis, market analysis, statistical data analysis and so on. This paper presents the design, implementation and evaluation of a cluster analysis expert system, called EasyClustering, developed in order to assess the performance of different compression based clustering approaches and automatically computes the quality of the solutions. The system has 2 main integrated components: 1. A clustering component (Cernian et. al., 2011), with 3 compression algorithms (ZIP, bzip2 and GZIP), 4 distance metrics (NCD, Jaro, Jaccard and Levenstein) and 3 clustering algorithms (UPGMA, MQTC and k-means). 2. A cluster analysis expert system, which performs an automatic evaluation of the quality of the clustering results, using one of the most representative quality measures - the FScore (van Rijsbergen, 1976). The research conducted with the EasyClustering platform has the following objectives: 1. To establish which is the most appropriate clustering context for using the compression based approach 2. To facilitate a comparative analysis of the clustering results produced by various combinations of compression algorithms, distance metrics and clustering algorithms 3. To evaluate the benefits of the compression based clustering approach 4. To provide an expert system component to automatically assess the quality of the clustering solutions 5. To investigate if traditional clustering methods have improved performance when the input is compressed The rest of the paper is structured as follows: Section 2 presents the theoretical background and some related work, Section 3 describes the EasyClustering platform and the methodology for using the platform, Section 4 presents some experimental results for validating the capabilities of this integrated system, and Section 5 draws the conclusions for this work. 2. Cluster Validity State of the Art At present, there are several clustering platforms available, such as: Studies in Informatics and Control, Vol. 24, No. 2, June 2015 http://www.sic.ici.ro An Integrated Cluster Analysis and Validity Test Platform for the Compression based Clustering Approach Alexandra CERNIAN*, Dorin CARSTOIU, Adriana OLTEANU, Valentin SGARCIU University Politehnica of Bucharest , 313 Splaiul Independentei, Bucharest, Romania. Alexandra.cernian@aii.pub.ro * corresponding author Abstract: This paper focuses on the compression based clustering and aims to determine the most suitable combinations of algorithms for different clustering contexts (text, heterogeneous data, Web pages, metadata and so on) and establish whether using compression with traditional clustering methods leads to better performance. In this context, we propose an integrated cluster analysis test platform, called EasyClustering, which incorporates two subsystems: a clustering component and a cluster validity expert system, which automatically determines the quality of a clustering solution by computing the FScore value. The experimental results are focused on two main directions: determining the best approach for compression based clustering in terms of context, compression algorithms and clustering algorithms, and validating the functionality of the cluster analysis expert system for determining the quality of the clustering solutions. After conducting a set of 324 clustering tests, we concluded that compressing the input when using traditional clustering methods increases the quality of the clustering solutions, leading to results comparable to the NCD and the cluster analysis expert system proved 100% its accuracy so far, so we estimate that, even if some slight deviation should occur, it will be minimal. Keywords: clustering, compression, cluster analysis, FScore, expert system. 151