MPGI National Multi Conference 2012 (MPGINMC-2012) 7-8 April, 2012 “Recent Trends in Computing” Proceedings published by International Journal of Computer Applications® (IJCA)ISSN: 0975 - 8887 9 Information Extraction for Prediction: Application for web service for conference alerts Vandana korde Department of Computer Engineering Sardar Vallabhbhai National Institute of Technology, Surat Gite Hanumant R. Department of Computer Science&IT, Dr. B.A.M.University, Aurangabad C. Namrata Mahender Department of Computer Science&IT, Dr. B.A.M.University, Aurangabad ABSTRACT In the general framework of Knowledge discovery ,data mining techniques are usually dedicated to information extraction from structured database .Text mining techniques ,on other hand are dedicated to information extraction(IE) from unstructured textual data and Natural language Process(NLP) can then see as helpful tool for text mining procedure. In this paper we discussed about our work related to IE and proper structuring of the web news related to conference like name of conference, date, location and area of interest etc. Here we have also emphasised on the major issues while extracting and correlating those information for further processing. Keyword: Data mining, Text mining, Information Extraction 1. INTRODUCTION Today’s world is full of information. Abundant of information is available just even for a relatively small word search. Examples of such data include email, text, Web pages, newsgroup postings, news articles, call-centre text records, business reports, research papers, and so on. In its raw form, the data has limited value since we can do little with it beyond keyword search. Consequently, over the past two decades, significant efforts have focused on the problem of extracting structured information (e.g., researchers, publications, co- author and advising relation-ships, etc.) from such data. The extracted information is then exploited in search, browsing, querying, and mining. In recent years, the explosion of unstructured data on the World-Wide Web has generated significant further interests in the above extraction problem, and helped position it as a central research goal in the database, AI, data mining, IR, NLP, and Web communities. [1] An illustrative (but far from exhaustive) list of current projects that address this research goal include: (1) entity matching and approximate joins at AT&T Research, MSR and Stanford, (2) answering structured queries over text at Columbia and UCLA, (3) intelligent email and personal information management (PIM) at CMU, Massachusetts, MIT and Washington, (4) extracting and querying semantic entities/relations at IIT Bombay, CMU, MSR and Washington, (5) data cleaning at MSR, (6) doing OLAP-style analysis using extracted information at IBM Almaden and Wisconsin, (7) standardization efforts at IBM Watson on interfaces for NLP extraction tools, (8) managing unstructured data in bioinformatics at Illinois and Michigan, and (9) Web-based community information management (CIM) at Illinois and Wisconsin. In our work we concentrate on the extraction of related information for an individual from the conference alerts because every information is not needed by every individual which can further be useful for personalised web services. 2. TEXT MINING AND IE Text mining is a young interdisciplinary field which draws on information retrieval, data mining, machine learning, statistics and computational linguistics. As most information (over 80%) is stored as text, text mining is believed to have a high commercial potential value. knowledge may be discovered from many sources of information; yet, unstructured texts remain the largest readily available source of knowledge. The problem of Knowledge Discovery from Text (KDT) [2] is to extract explicit and implicit concepts and semantic relations between concepts using Natural Language Processing (NLP) techniques. This paper suggests a new framework for text mining based on the integration of Information Extraction (IE), and Knowledge Discovery from Databases (KDD), data mining. Text prediction is one of the most widely used techniques to enhance the communication rate in augmentative and alternative communication, as like text mining approach is used to the Prediction of Disease Status from Clinical Discharge Summaries[3],[4]. Prediction from text can be just as ambitious as prediction for numerical data mining. In statistical terms, prediction has a very specific characterization, and it need not deal with just topic assignment to documents. Prediction for text follows the classical lines of all numerical classification problems. As Information Extraction is one of important application of NLP, its impact while mining may provide with good performance. Because the traditional data mining assumes that the information to be “mined” is already in the form of a relational database. Unfortunately, for many applications, electronic information is only available in the form of free natural-language documents rather than structured databases. Since IE addresses the problem of transforming a corpus of textual documents into a more structured database, the database constructed by an IE module can be provided to the KDD module for further mining of knowledge. While constructing an IE system is a difficult task, there has been major recent progress in using machine learning methods to help automate the construction of IE systems [5, 6, 7 and 8]. However, the accuracy of current IE systems is limited and therefore an automatically extracted database will inevitably contain significant numbers of errors. So in our work cleaning and structuring the information is our key objective and then correlate the information. 3. PROPOSED APPROACH Our approach to text mining is motivated by practical applications. However, the design and development of prediction methods, by considering the need of text predictors we are trying to develop a module which gives the some prediction form text document. There are many sources of news on the Web, often taken from newswire services such as Reuters or Associated Press, news of some event according to