International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 5 Issue: 11 32 – 43 _______________________________________________________________________________________________ 32 IJRITCC | November 2017, Available @ http://www.ijritcc.org _______________________________________________________________________________________ Structured and Unstructured Information Extraction Using Text Mining and Natural Language Processing Techniques S. Nagarajan Department of Computer Applications, School of Information Technology, Madurai Kamaraj University, Madurai, Tamilnadu,India nagasethu2000@yahoo.com Dr. K. Perumal Department of Computer Applications, School of Information Technology, Madurai Kamaraj University, Madurai, Tamilnadu,India perumalmkucs@gmail.com Abstract—- Information on web is increasing at infinitum. Thus, web has become an unstructured global area where information even if available, cannot be directly used for desired applications. One is often faced with an information overload and demands for some automated help. Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents by means of Text Mining and Natural Language Processing (NLP) techniques. Extracted structured information can be used for variety of enterprise or personal level task of varying complexity. The Information Extraction (IE) in also a set of knowledge in order to answer to user consultations using natural language. The system is based on a Fuzzy Logic engine, which takes advantage of its flexibility for managing sets of accumulated knowledge. These sets may be built in hierarchic levels by a tree structure. Information extraction is structured data or knowledge from unstructured text by identifying references to named entities as well as stated relationships between such entities. Data mining research assumes that the information to be ―mined‖ is already in the form of a relational database. IE can serve an important technology for text mining. The knowledge discovered is expressed directly in the documents to be mined, then IE alone can serve as an effective approach to text mining. However, if the documents contain concrete data in unstructured form rather than abstract knowledge, it may be useful to first use IE to transform the unstructured data in the document corpus into a structured database, and then use traditional data mining tools to identify abstract patterns in this extracted data. We propose a novel method for text mining with natural language processing techniques to extract the information from data base with efficient way, where the extraction time and accuracy is measured and plotted with simulation. Where the attributes of entities and relationship entities from structured and semi structured information .Results are compared with conventional methods. Keywords: Information Extraction (IE), Unstructured, semi structured, Data Mining, Natural Language Processing (NLP), Text mining ( TM) __________________________________________________*****_________________________________________________ I. Introduction The huge amount of documents on the web (or specifically, the web pages) by searching through a search engine or browsing through hyperlinks existed within web pages. Users which have no specific target often choose browsing web pages to achieve their final goal. However, many users have difficulty of getting start from a page which will eventually lead to their goals. Hence many portal sites emerge to provide such starting points. These sites often provide some sorts of navigating structure such as web directories or web hierarchies. Users can achieve a thematic navigation through such structures. However, these structures are generally constructed by human experts by hands, causing them lack of coverage and hard to maintain. Data stored in most text databases are semi structured data in that they are neither completely unstructured nor completely structured. For example, a document may contain a few structured fields, such as title, authors, publication date, length, category, and, so on, but also contain some largely unstructured text components, such as abstract and contents. There have been a great deal of studies on the modeling and implementation of semi structured data in recent database research.[6] Information Retrieval techniques, such as text indexing, have been developed to handle unstructured documents. But, traditional Information Retrieval techniques become inadequate for the increasingly vast amounts of text data. Typically, only a small fraction of the many available documents will be relevant to a given individual or user. Without knowing what could be in the documents, it is difficult to formulate effective queries for analyzing and extracting useful information from the data. Users need tools to compare different documents, rank the importance and relevance of the documents, or find patterns and trends across multiple documents. Thus, Text Mining has become an increasingly popular and essential theme in Data Mining.