International Journal of Computer Applications (0975 – 8887) Volume 182 – No. 28, November 2018 28 A Template-based Information Extraction System for Text Understanding Dania Sagheer PhD Student Artificial Intelligence and Natural Language Dept Aleppo University, Syria Fadel Sukkar Professor Artificial Intelligence and Natural Language Dept Aleppo University, Syria ABSTRACT This paper presents a template-based information extraction system for Arabic descriptive text understanding. The system depends on knowledge base. The knowledge base contains facts and rules. The facts are derived from AL Khalil lexicon, Al Ramous lexicon and a Stanford model. The rules represent the designed templates. The templates are helpful for detecting the meaning of the text. the inference engine depends on the hybrid chaining to fill the slots in templates from the text. The semantic criterion is augmented to the templates. the criterion calculates the frequency of the template in the text. the system is tested on Arabic texts taken in oil production domain from Arabic news website as Arabic CNN, and Arabic BBC. The system implements good response in getting the goal of descriptive text. Text understanding is made efficiency, and high accuracy is obtained. General Terms Artificial Intelligence, Natural Language Processing. Keywords Text Understanding, knowledge Base, Information Extraction. Template. 1. INTRODUCTION Recently, text understanding has become a very important task of natural language processing. Amount of text is increasing on the websites, and no time to read all text, so the need to automatic text understanding system is increased. The descriptive text is a text which shows a given problem and the reader needs to read the whole of the text to understand what the goal of this text. Information Extraction is an important strategy in text understanding task. The researches study how the knowledge is extracted from unstructured text. the researchers conducted many approaches for extracting of information from huge textual data as classification and clustering [1]. In [2] information extraction becomes more challenging because the text in social media is not structured according to the grammatical rules. The research work in [3] shows an information extraction system to link entities in the text with their corresponding entities in a knowledge base. The text can be represented in many approaches as a bag of words and a graph, query representation can link the entity in the text with query for build information retrieval system [4]. Information extraction can be performed by template based approach [5]. A template based approach is used in many researches for text understanding [6]. A template is a set of slots or variables that describe an event in a specific domain [7] and presents helpful information for getting the goal of the text. The research [8] presents a joint entity model to extract templates and slots from raw text. To get matching between the entities in the raw text and the templates, information rich lexicons are needed. research teams in Arabic language present AL Khalil lexicon [9], [10], Al Ramous lexicon [11] and a Stanford model to give lexical vocabulary [12]. The paper is organized as follows: second section illustrates preparing of texts, third section shows designed templates. The fourth section presents the knowledge based system.in the fifth section the semantic inference rules are presented. Whereas the sixth section presents the work algorithm, in seventh section the results are shown, and finally in the eighth section the conclusion is illustrated. 2. PREPARING OF TEXTS Descriptive texts are collected from formal news websites Arabic BBC, Arabic CNN, Syrian Arab News Agency SANA. Collected texts describe the state of the oil production and explain oil organizations opinions about production of the oil. The texts are segmented into paragraphs according to punctuation marks as point and comma. The paragraphs are split into their words. 3. DESIGNED TEMPLATES The templates are designed according to the morphological, syntactical and vocabulary components of text sentences. The designed templates are created for helping to get the goal of oil production text. the goal of oil production text is the knowledge about the organizations agreement to increase the oil production or decrease it. The texts are studied and analyzed for finding the appropriate entities and designing the templates. The entities of templates are choosed to detect the appropriate goal of the descriptive text. The oil minister plays role in making decision about oil production, it is remarked that oil price is the important factor to detect amount of oil production, oil is measured by barrel. The organizations measure its production by amount of barrels or by financial values. Each country or organization sets a price for oil barrel. The designed templates are: