IJSRSET18495 | Received : 01 July 2018 | Accepted : 10 July 2018 | July-August-2018 [ 4 (9) : 51-58] © 2018 IJSRSET | Volume 4 | Issue 9 | Print ISSN: 2395-1990 | Online ISSN : 2394-4099 Themed Section : Engineering and Technology 51 Document Categorization by using Weighted J48 Classifier Sonali Suskar, Dr. S. D. Babar Department of Computer Engineering SIT College of Engineering, Lonavala, Maharashtra, India ABSTRACT In the field of information retrieval text categorization is the key research area in present. The text categorization selects entries from set of prebuilt categories and allots those to a document. Learning with high dimensional data space is challenging in a text categorization method. Learning with high-dimensional features may prompt a heavy calculation overhead and may affect the classification performance of classifiers because of unrelated and repetitive features. To improve the “scourge of dimensionality “issue and to accelerate the learning procedure of classifiers, it is important to perform feature reduction to reduce the size of features. This paper introduces a Bayesian arrangement approach and WeightedJ48 classifier for auto text categorization using class-specific features. For text classification, the proposed strategy selects a specific feature subset for every class. The presented system reconstructs PDF in raw data space from class specific PDF in low dimensional feature space and assembles Bayes classification rule utilizing Baggenstoss PDF Projection Theorem. The detectable importance of this methodology is that many feature selection criteria. The WeightedJ48 classifier saves the time and memory. The proposed system also uses Term weighting concept for pre-processing. These methods increase the accuracy of classification, feature selection process, and improve the system performance. Keywords: Text categorization, class-specific features, Feature selection, PDF projection and estimation, dimension reduction, WeightedJ48, Term weighting. I. INTRODUCTION As data size on net as well as different companies will grow, there is huge requirement of a method for dealing with the huge size of information that can be filter and deals these information types. The main categories is to separate the free text files in the categories that are defined previously, categorization of emails and files in folder tree, labelling of the topics, Particular processing operations, structures search as well as surfing or searching files which has long term interests or dynamic task depending interests. In different contexts professionals are selected to classes the new items, yet this procedure is especially time taking and in addition will as exorbitant so bounding its applicability apparently there is a more enthusiasm for the research and development work of the strategies for text categorization automatically. There are various classifications and machine-learning techniques are developed for categorization of text like the one rule learning algorithms nearest neighbour‟s classifiers, Support Vector Machines, decision trees etc. Text categorization (TC) described as text classification, in this a documents are automatically classified by using predefined set. This process can be used in many systems; also in automated indexing of scientific articles based on predefined thesauri of terms, which are technical, filing patents inside the patent directories, chosen dissemination of data-to- data consumers, hierarchical catalogues for automated