KeaKAT – An Online Automatic Keyphrase
Assignment Tool
Rabia Irfan, Sharifullah Khan, Irfan Ali Khan, Muhammad Asif Ali
School of Electrical Engineering & Computer Science (SEECS),
National University of Sciences and Technology (NUST), Islamabad, Pakistan
{09msitrirfan, sharifullah.khan, 08bitikhan ,08bitasifa}@seecs.edu.pk
Abstract—Kea++ is a well known tool for assigning keyphrases
to documents. But Kea++ contains noise and irrelevant terms in
the keyphrase result set. The extended refinement methodology
was developed to fine tune the results of Kea++ for multiple
domains. However using Kea++ and its refinement as a system
for assigning keyphrases to documents is not simple for users
of a domain other than computing. The system needs to be
installed and configured. It does not have any GUI to facilitate
users in assigning keyphrases. The objective of the KeaKAT is
to develop a web-based keyphrase assignment tool to facilitate
users in assigning relevant keyphrases to their documents online.
KeaKAT saves users not only from installing and configuring the
system, but also improves the usability of the system.
Index Terms—Keyphrase assignment, information extraction,
usability, classification system
I. I NTRODUCTION
Keyphrases are the words that present the concise summary
of a document or text [1], [5], [11]. They can be used in
variety of applications that involve organization and man-
agement of the huge amount of information. They can be
helpful in browsing document collections [8], can be used
as metadata [16], can be used to index document collections
[16], [7] and can assist in classification and clustering of
document collections [10]. Because of their usage in different
applications, many tools were develop that can be helpful in
automatically generating keyphrases. Two main approaches
were used; one is extraction of keyphrase from a document
text known as keyphrase extraction and other is alignment
of document with a classification system/taxonomy known as
keyphrase assignment. Kea++ [14] is a well known tool that
can perform both keyphrase assignment and extraction based
on a given input. However the output produced by Kea++
contains noise and irrelevant terms. The work done by [4]
proposed refinement methodology that takes Kea++ assigned
keyphrases as input and generates refined keyphrases. The
methodology exploits the hierarchical structure of a taxonomy
[15], [6] as well as common heuristics to fine tune the
result of Kea++. The refinement methodology was extended
in the work [9]. The extended refinement methodology aimed
to improve and generalize the refinement methodology for
multiple domains.
Both Kea++ and the extended refinement methodology work
together to produce better results for keyphrase assignment
to documents. Kea++ and its refinements as a system need
installation and configuration for assigning keyphrases to
documents accurately. There is no graphical user interface
(GUI) that has been provided to facilitate users in using the
system. Automatic keyphrase assignment techniques are not
only helpful for computing experts, but can be equally impor-
tant and applied in other disciplines of academia, particularly
in library sciences. The users belonging to fields other than
computing are not computer experts. Most of the time they
are not comfortable in using techniques that involve too much
of the computer understanding, neither they bother themselves
to understand the technical details of such systems. Therefore
the available automatic tools are not widely used by academia.
The objective of the KeaKAT is to improve the usability
of the automatic keyphrase tools. KeaKAT is a web-based
automatic keyphrase assignment tool. The tool uses Kea++
and the extended refinement methodology in background for
keyphrase assignment. KeaKAT facilitates users to train and
test Kea++ and also applies refinement in the background for
assigning relevant keyphrases to their documents online. It
saves users from installing and configuring the system and
facilitates users through GUI to get their job easily done. It
improves the usability of the system.
The rest of the paper is organized as follows: Section 2
discusses the related work and existing systems. Section 3
explains the architecture, working of the proposed tool. Com-
parative analysis of the existing systems with the proposed
system is described in Section 3. Section 4 concludes the paper
and discusses the future work.
II. RELATED WORK
This section discusses existing systems for automatic
keyphrase assignment. Kea [17] and its later version Kea++
[14] are famous tools developed at the University of Waikato
for performing the task of keyphrase generation automatically.
Kea is a machine learning based tool and it works in two
phases; training phase and extraction phase. Initially Kea was
used to extract keyphrase from documents later on it was
extended to perform keyphrase assignment and known as
Kea++. Kea++ also works in two phases like Kea i.e. training
and extraction. During each phase it works in two sub steps:
candidate identification and filtering. During candidate iden-
tification step language dependent techniques such as: input
cleaning, stemming etc are applied to form pseudo phrases.
During the filtering step, those keyphrases are identified which
are the most suitable candidates based on four features: Term
2012 10th International Conference on Frontiers of Information Technology
978-0-7695-4927-9/12 $26.00 © 2012 IEEE
DOI 10.1109/FIT.2012.14
30