Web-Ear, an Information Retrieval System That Uses
Reported Speech Expressions in Turkish
C. B. Ozdemir, B. Diri
Yildiz Technical University, Department of Computer Engineering
Besiktas, Istanbul 34349 TURKEY
canberkozdemir@yahoo.com, banu@ce.yildiz.edu.tr
Abstract— Web-Ear is an information retrieval system in which
anyone can search "What person X told about the topic Y" on
the internet or in the database of the system. If the search is made
on the internet, first the system submits a query to Google search
engine and retrieves a set of information. In order to isolate
speech portions of retrieved data, an extraction process is
required to be performed according to pre-defined rules set by
regular expressions, and only after then we have the data that we
want to present to the user.
Keywords-information retrieval; information extraction;
natural language processing; name-entity recognition; web-ear;
reported speech
I. INTRODUCTION
The commonly encountered question of the recent years is,
"Am I being eavesdropped?". Let's change this question and
look for answer of "Who said what about a given subject?".
The designed system is able to collect data providing an answer
for the second question. Users will be able to learn what a
person said about a given subject using this system, called the
Web-Ear.
In the present decade, internet network offers us to gain
access to a bundle of information. Those huge encoded texts
available on the internet can be processed by considering them
as a natural language. Web-Ear accesses Google pages to
process the sources as a natural language for the demanded
search about "What X said about Y". Sources that the system
processes may report some direct and indirect speeches. In
terms of both the Turkish and the English grammatical
structures, reported speech refers to direct speech, as indicated
by Arnazarov [1]. In contrast to Arnazarov, we use both direct
and indirect speech forms as reported speech.
Sabine Bergler, Rene Witte and Ralf Krestel constructed a
fuzzy believer system, which extracts reported speech from the
newspapers using NLP Standard Components [2]. Fuzzy
Believer is a system, which extracts and processes beliefs and
opinions including personal statements from newspapers and
texts using Natural Language Processing methods. Several
fuzzy set operators, including fuzzy belief revision, are applied
to set a model with a different belief strategy. The first
component as a module of the Fuzzy Believer is reported
speech extraction. The Fuzzy Believer finds reported speech
via the reported verb finder by marking the verbs within the
sentences. Other studies by the same researchers regarding
reported speech concentrated on the reported verb finder and
the reported speech finder mechanisms [3].
The present system focuses on Information Extraction and
Information Retrieval [4] disciplines of Natural Language
Processing. Özkan Bayraktar used reported speech patterns to
extract names from Turkish financial news during his graduate
study entitled as "Person Name Extraction From Turkish
Financial News Text Using Local Grammar Based
Approach"[5]. Named Entity Recognition, a sub task of
Information Extraction, is used for finding people’s names and
the task of finding proper names is required for the
identification of regular expression patterns of reported speech
in this system.
The outline of the process includes researching the patterns
used for generating reported speech expressions [6] and
identifying the sentences with those patterns from various web
sites. Through this study, regular expressions, which match
with the reported speech expressions, are generated. The
system combines regular expressions and the name of the
person, who is searched for. The first name - last name or only
the last name, are tagged by the system as proper names to
search together with the regular expressions in order to match
with natural language texts.
Following the extraction of the reported speech
expressions, the system calls for the cosine similarity method
for catching the similar expressions.
The final step is to present the results of question "What did
X say about Y?" in a format such that the designed system
returns a collection of sentences.
The remainder of this research article is structured as
follows: In Section 2, we provide an overview of the Web-Ear
system with detailed explanations on the interactions and how
its modules work. A performance evaluation of our approach
has been conducted and its results are presented in Section 3.
Section 4 discusses several aspects of the related study,
followed by the conclusions that were provided in Section 5.
II. DESIGN AND IMPLEMENTATION
The starting point of this system is to access user inputs.
Personal identification and topic inputs are important for
retrieving relevant pages from the search engine and finding
people’s expressions via combining the person’s name with
regular expressions, functioning as a part of the Named Entity
Recognition pattern [5]. After this step, extraction of the
370
978-1-61284-922-5/11/$26.00 ©2011 IEEE