© 2003 National Institute of Informatics
Proceedings of the Third NTCIR Workshop
NTT/NAIST’s Text Summarization Systems for TSC-2
Tsutomu Hirao
†
Kazuhiro Takeuchi
‡
Hideki Isozaki
†
Yutaka Sasaki
†
Eisaku Maeda
†
†
NTT Communication Science Laboratories, NTT Corp.
2-4 Hikari-dai, Seika-cho, Soraku-gun, Kyoto, 619-0237, Japan
{hirao,isozaki,sasaki,maeda}@cslab.kecl.ntt.co.jp
‡
Nara Institute of Science and Technology
8916-5 Takayama-cho, Ikoma-city, Nara, 630-0101, Japan
kazuh-ta@is.aist-nara.ac.jp
Abstract
In this paper, we describe the following two ap-
proaches to summarization: (1) only sentence extrac-
tion, (2) sentence extraction + bunsetsu elimination.
For both approaches, we use the machine learning al-
gorithm called Support Vector Machines. We partic-
ipated in both Task-A (single-document summariza-
tion task) and Task-B (multi-document summarization
task) of TSC-2.
Keywords: Sentence extraction, Bunsetsu elimina-
tion, Support Vector Machines
1 Introduction
In this paper, we describe the following two ap-
proaches to summarization:
(1) only sentence extraction,
(2) sentence extraction + bunsetsu elimination.
The first system is based on important sentence ex-
traction by using Support Vector Machines (SVMs).
The second is important sentence extraction and also
bunsetsu elimination by using SVMs. The difference
between these two systems (System (1) and System
(2)) is illustrated in Figure 1.
We participated in both Task-A (single-document
summarization task) and Task-B (multi-document
summarization task) of TSC-2.
The remainder of this paper is organized as follows.
Section 2 describes the machine learning algorithm,
Support Vector Machines (SVMs), that we apply to
our systems. In Section 3, we explain our sentence
extraction method. Section 4 describes our bunsetsu
elimination method. In Section 5, we give our evalua-
tion results at TSC-2.
‡
Currently with Communication Research Laboratories.
Revison
Phase
extacted raw
sentences
ummary by System1 Summary by System 2
extracted sentences
with elimination
of bunsetsus
extra extracted sentences
Figure 1. Difference between two sys-
tems
2 Support Vector Machines
SVM is a supervised learning algorithm for two-
class problems [8].
Training data is given by
(x
1
,y
1
), ··· , (x
u
,y
u
), x
j
∈ R
n
,y
j
∈{+1, −1}.
Here, x
j
is a feature vector of the j -th sample and y
j
is its class label, positive (+1) or negative (−1). SVM
separates positive and negative examples by a hyper-
plane given by
w · x + b =0, w ∈ R
n
,b ∈ R, (1)
In general, such a hyperplane is not unique. The
SVM determines the optimal hyperplane by maximiz-
ing the margin. The margin is the distance between
negative examples and positive examples, i.e., the dis-
tance between w · x + b =1 and w · x + b = −1. The
examples for w · x + b = ±1 compose what is called
the Support Vector, which represents both positive and
negative examples.
Here, the hyperplane must satisfy the following
constraints: