Asian Journal of Computer Science And Information Technology 5:11 (2015) 62 – 66. Contents lists available at www.innovativejournal.in Asian Journal of Computer Science And Information Technology Journal Homepage: http://innovativejournal.in/ajcsit/index.php/ajcsit 62 AN ENUMERATIVE FRAMEWORK FOR EXTRACTION OF BAG-OF-WORDS FROM LEGAL DOCUMENTS Basaveswar Rao. B, B.V.Rama Krishna, Gangadhara Rao. K, Chandan. K Dept. Of Computer Science & Engineering, Acharya Nagarjuna University-522510, India ARTICLE INFO ABSTRACT Corresponding Author: B.V.Rama Krishna Dept. Of Computer Science & Engineering, Acharya Nagarjuna University-522510, India Key Words: Stop-Words, Stemming, Porter Stemmer, Bag- of-Words, Judgments, Word- Frequency. DOIhttp://dx.doi.org/10.15520/ ajcsit.v5i11.35 In this paper an enumerative frame work is developed for extraction of Bag-of- Words from legal documents. For this purpose 100 judgments of Supreme Court of India related to Dowry cases are considered. From the judgments the case notes are taken as a text input and extracted a set of Bag-of-Words. A novelistic algorithm is presented and implemented for this purpose. For filtering the insignificant words from the Bag-of-Words a threshold value has been applied on word frequencies. This Bag-of-Words may be utilized in Data Mining applications to extract Knowledge Discovery from judgments. ©2015, AJCSIT, All Right Reserved. 1. INTRODUCTION Text mining is an emerging research area in modern era because most of the information available in the form of electronic documents. These electronic documents are available in digital libraries, online chats, e- mails, social media and in the form of fields in downloaded PDF/WORD documents. These electronic documents would have potential influence on the areas like marketing, financial, medical and legal fields especially when one tries to analyze these documents. Both public and private sectors use these data repositories as a source of data for Data Mining and Text Mining Techniques. There is a need to identify new knowledge discovery techniques and Information Retrieval techniques to better the use of these resources. The Text Mining results such as Association rule mining, Generalization, Classification, Clustering and Outlier Analysis can be applied on text documents during Text Mining process [19]. Natural Language Processing(NLP) plays a key role in text mining as they support wide range of services like syntactical parsing, linguistic analysis, word stemming, multi word phrase grouping, synonym normalization, parts-of-speech, tagging, word sense disambiguation, anaphora resolution and role determination. NLPs increase the effectiveness of text mining during mining natural language documents [1]. Machine Learning support both supervised and unsupervised learning techniques [7]. They show high degree of performance with good accuracy during text mining. Some popular machine learning techniques used in text mining are Self Organizing Maps (SOM), Support Vector Machine (SVM), Bayesian Networks, Boosting Algorithm, Latent Variable model and Helmholtz Machines [2]. Machine Learning supports classification, clustering, filtering, extraction, retrieval and data mining services to text processing [18]. Information Retrieval Systems (IRS) employ artificial intelligence mechanisms not only to retrieve information but also helpful in decision support [14]. IRS is strongly supported with Tautology, Boolean Algebra and Fuzzy Logic to affectively extract knowledge from information. The Information Retrieval in legal documents is a key area of research from the past decade. The classification, clustering and other data mining techniques are used. All the studies are based on document representation. Information Retrieval Systems employed to retrieve information from legal documents are named as ‘Legal IR Systems’ [9]. With the advent of knowledge engineering Artificial Intelligence and Case Based reasoning are tailored to design new generation Legal IRS. Knowledge Discovery of Data (KDD) approach to legal documents is a staged extraction of knowledge from data repositories. The essential stages in Legal data mining are preprocessing, extraction, transformation, loading, rule mining, classification, clustering and visualization. There is a need to identify new Data Mining Technique as well as innovative procedures for better use of these resources. ‘Bag of Words’ (BoW) is to perform associations among user queries and documents retrieved. This is also used in Machine Learning applications over text documents. Further BoW reduces the time complexity of analysis and also increases the accuracy. Classification and Ranking of judgments based on user query is the goal of judicial search engines. In this process BoW extraction from legal documents is an essential phase [10] and it provides a basis for applying Data Mining Techniques on the Data Structures created. Not much research has been done in this direction on the Indian Legal Documents generated. The main goal of this paper is to provide a