I.J. Intelligent Systems and Applications, 2015, 02, 56-64
Published Online January 2015 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijisa.2015.02.08
Copyright © 2015 MECS I.J. Intelligent Systems and Applications, 2015, 02, 56-64
Hybrid Approach to Pronominal Anaphora
Resolution in English Newspaper Text
Kalyani P. Kamune
RKNEC, Department of Computer Science, Nagpur, 440013, India
Email: kalyanikamune24@gmail.com
Avinash Agrawal
RKNEC, Department of Computer Science, Nagpur, 440013, India
Email: avinashjagrawal@gmail.com
Abstract— One of the challenges in natural language
understanding is to determine which entities to be referred in the
discourse and how they relate to each other. Anaphora
resolution needs to be addressed in almost every application
dealing with natural language such as language understanding
and processing, dialogue system, system for machine translation,
discourse modeling, information extraction. This paper
represents a system that uses the combination of constraint-
based and preferences-based architectures; each uses a different
source of knowledge and proves effective on computational and
theoretical basis, instead of using a monolithic architecture for
anaphora resolution. This system identifies both inter-sentential
and intra-sentential antecedents of “Third person pronoun
anaphors” and “Pleonastic it”. This system uses Charniak Parser
(parser05Aug16) as an associated tool, and it relays on the
output generated by it. Salience measures derived from parse
tree are used in order to find out accurate antecedents from the
list of all potential antecedents. We have tested the system
extensively on 'Reuters Newspaper corpus' and efficiency of the
system is found to be 81.9%.
Index Terms—Natural Language processing, Anaphora
resolution, Discourse, Pronominal Resolution, Co-reference,
Discourse Modeling, Artificial Intelligence
I. INTRODUCTION
Resolution of anaphoric reference is one of the most
challenging tasks in the field of natural language
processing. It is extremely difficult to give a complete,
plausible and computable description of resolution
process as we ourselves deal with it only subconsciously
and are largely unaware of the particularities. The task of
anaphora resolution is even frequently considered to be
AI-complete. Anaphora accounts for the cohesion in the
text and is active study in formal and computational
linguistics alike. Identifying correct anaphora plays a
vital role in Natural Language Processing. Automatic
resolution of anaphors is crucial task in the understanding
of natural language by computers. Understanding of
natural language is difficult for computers because
natural languages are inherently ambiguous. On the other
hand, human beings can easily manage to pick out the
intended meaning from the set of possible interpretations
unlike computers due to their limited knowledge and
inability to get their bearings in complex contextual
situations.
Ambiguity can be presented at different level. It can be
presented at lexical level where one word may have more
than one meaning (e.g. bank, chair, files). It can also be
presented at syntactical level when more than one
structural analysis is possible. Ambiguity can also be
presented at semantic level or pragmatic level. The
automatic resolution of ambiguity requires a huge amount
of linguistic and extra-linguistic knowledge as well as
inferring and learning capabilities, and is therefore
realistic only in restricted domains.
A. Basic Notions and terminologies
Cohesion occurs where the interpretation of the some
element in the discourse is dependent on that of another
and involves the use of abbreviated or alternative
linguistics forms which can be recognized and understood
by the hearer or the reader .This refers to or replaces
previously mentioned items in the spoken or written text.
e.g. “Sita is a teacher. Her dream is to visit Paris.”
In the above example, it is very normal to observe that
second sentence is related to the first sentence and hence
we can say that cohesion is present. In the second
sentence, ‘her’ refers to ‘Sita’. Now, in the above
example, if we replaced ‘her’ by ‘him’, or the whole
sentence is replaced by some another isolated sentence,
cohesion does not occur any more as the interpretation of
second sentence is no longer depends on the first sentence.
Discourse features an example of anaphora with the
possessive pronoun ’her’ referring to the previously
mentioned noun phrase ‘Sita’. [1]
Anaphora is described as cohesion which points back
to some previous item. The pointing back word or the
phrase is called an anaphora and the entity to which it
refers or for which it stands is its antecedent. The
process of determining antecedent for an anaphora is
called anaphora resolution. When the anaphora refers to
an antecedent and both have the same referent in the real
world, they are termed co-referential. Various
terminologies mentioned above like anaphora, antecedent,
anaphora resolution and co-referential can be explained
well with the help of example as bellow. [3]
e.g. “The King is not here yet but he is expected to
arrive in the next half an hour.”