Cluster Based Term Weighting Model for Web Document Clustering B. R. Prakash, M. Hanumanthappa and M. Mamatha Abstract The term weight is based on the frequency with which the term appears in that document. The term weighting scheme measures the importance of a term with respect to a document and a collection. A term with higher weight is more important than a term with lower weight. A document ranking model uses these term weights to find the rank of a document in a collection. We propose a cluster-based term weighting models based on the TF-IDF model. This term weighting model update the inter-cluster and intra-cluster frequency components uses the generated clusters as a reference in improving the retrieved relevant documents. These inter cluster and intra-cluster frequency components are used for weighting the importance of a term in addition to the term and document frequency components. Keywords Term weighting scheme Document clustering Information retrieval Data mining 1 Introduction A document clustering algorithm helps to find groups in documents that share a common pattern [15]. It is an unsupervised technique and is used to automatically find clusters in a collection without any user supervision. The main goal of the clustering is to find the meaningful groups so that the analysis of all the documents within clusters is much easier compared to viewing it as a whole collection. The Vector Space Model (VSM) represents a document using a vector of T unique terms in a collection (T-dimension). Each term in a vector is associated B. R. Prakash (&) M. Hanumanthappa Department of Computer Science and Applications, Bangalore University, Bangalore, India e-mail: India.brp.tmk@gmail.com M. Mamatha Department of Computer Science, Sri Siddaganga College for Women, Tumkur, India M. Pant et al. (eds.), Proceedings of the Third International Conference on Soft Computing for Problem Solving, Advances in Intelligent Systems and Computing 259, DOI: 10.1007/978-81-322-1768-8_70, Ó Springer India 2014 815