Web-Ear, an Information Retrieval System That Uses Reported Speech Expressions in Turkish C. B. Ozdemir, B. Diri Yildiz Technical University, Department of Computer Engineering Besiktas, Istanbul 34349 TURKEY canberkozdemir@yahoo.com, banu@ce.yildiz.edu.tr Abstract— Web-Ear is an information retrieval system in which anyone can search "What person X told about the topic Y" on the internet or in the database of the system. If the search is made on the internet, first the system submits a query to Google search engine and retrieves a set of information. In order to isolate speech portions of retrieved data, an extraction process is required to be performed according to pre-defined rules set by regular expressions, and only after then we have the data that we want to present to the user. Keywords-information retrieval; information extraction; natural language processing; name-entity recognition; web-ear; reported speech I. INTRODUCTION The commonly encountered question of the recent years is, "Am I being eavesdropped?". Let's change this question and look for answer of "Who said what about a given subject?". The designed system is able to collect data providing an answer for the second question. Users will be able to learn what a person said about a given subject using this system, called the Web-Ear. In the present decade, internet network offers us to gain access to a bundle of information. Those huge encoded texts available on the internet can be processed by considering them as a natural language. Web-Ear accesses Google pages to process the sources as a natural language for the demanded search about "What X said about Y". Sources that the system processes may report some direct and indirect speeches. In terms of both the Turkish and the English grammatical structures, reported speech refers to direct speech, as indicated by Arnazarov [1]. In contrast to Arnazarov, we use both direct and indirect speech forms as reported speech. Sabine Bergler, Rene Witte and Ralf Krestel constructed a fuzzy believer system, which extracts reported speech from the newspapers using NLP Standard Components [2]. Fuzzy Believer is a system, which extracts and processes beliefs and opinions including personal statements from newspapers and texts using Natural Language Processing methods. Several fuzzy set operators, including fuzzy belief revision, are applied to set a model with a different belief strategy. The first component as a module of the Fuzzy Believer is reported speech extraction. The Fuzzy Believer finds reported speech via the reported verb finder by marking the verbs within the sentences. Other studies by the same researchers regarding reported speech concentrated on the reported verb finder and the reported speech finder mechanisms [3]. The present system focuses on Information Extraction and Information Retrieval [4] disciplines of Natural Language Processing. Özkan Bayraktar used reported speech patterns to extract names from Turkish financial news during his graduate study entitled as "Person Name Extraction From Turkish Financial News Text Using Local Grammar Based Approach"[5]. Named Entity Recognition, a sub task of Information Extraction, is used for finding people’s names and the task of finding proper names is required for the identification of regular expression patterns of reported speech in this system. The outline of the process includes researching the patterns used for generating reported speech expressions [6] and identifying the sentences with those patterns from various web sites. Through this study, regular expressions, which match with the reported speech expressions, are generated. The system combines regular expressions and the name of the person, who is searched for. The first name - last name or only the last name, are tagged by the system as proper names to search together with the regular expressions in order to match with natural language texts. Following the extraction of the reported speech expressions, the system calls for the cosine similarity method for catching the similar expressions. The final step is to present the results of question "What did X say about Y?" in a format such that the designed system returns a collection of sentences. The remainder of this research article is structured as follows: In Section 2, we provide an overview of the Web-Ear system with detailed explanations on the interactions and how its modules work. A performance evaluation of our approach has been conducted and its results are presented in Section 3. Section 4 discusses several aspects of the related study, followed by the conclusions that were provided in Section 5. II. DESIGN AND IMPLEMENTATION The starting point of this system is to access user inputs. Personal identification and topic inputs are important for retrieving relevant pages from the search engine and finding people’s expressions via combining the person’s name with regular expressions, functioning as a part of the Named Entity Recognition pattern [5]. After this step, extraction of the 370 978-1-61284-922-5/11/$26.00 ©2011 IEEE