© 2003 National Institute of Informatics Proceedings of the Third NTCIR Workshop NTT/NAIST’s Text Summarization Systems for TSC-2 Tsutomu Hirao Kazuhiro Takeuchi Hideki Isozaki Yutaka Sasaki Eisaku Maeda NTT Communication Science Laboratories, NTT Corp. 2-4 Hikari-dai, Seika-cho, Soraku-gun, Kyoto, 619-0237, Japan {hirao,isozaki,sasaki,maeda}@cslab.kecl.ntt.co.jp Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma-city, Nara, 630-0101, Japan kazuh-ta@is.aist-nara.ac.jp Abstract In this paper, we describe the following two ap- proaches to summarization: (1) only sentence extrac- tion, (2) sentence extraction + bunsetsu elimination. For both approaches, we use the machine learning al- gorithm called Support Vector Machines. We partic- ipated in both Task-A (single-document summariza- tion task) and Task-B (multi-document summarization task) of TSC-2. Keywords: Sentence extraction, Bunsetsu elimina- tion, Support Vector Machines 1 Introduction In this paper, we describe the following two ap- proaches to summarization: (1) only sentence extraction, (2) sentence extraction + bunsetsu elimination. The first system is based on important sentence ex- traction by using Support Vector Machines (SVMs). The second is important sentence extraction and also bunsetsu elimination by using SVMs. The difference between these two systems (System (1) and System (2)) is illustrated in Figure 1. We participated in both Task-A (single-document summarization task) and Task-B (multi-document summarization task) of TSC-2. The remainder of this paper is organized as follows. Section 2 describes the machine learning algorithm, Support Vector Machines (SVMs), that we apply to our systems. In Section 3, we explain our sentence extraction method. Section 4 describes our bunsetsu elimination method. In Section 5, we give our evalua- tion results at TSC-2. Currently with Communication Research Laboratories. Revison Phase extacted raw sentences ummary by System1 Summary by System 2 extracted sentences with elimination of bunsetsus extra extracted sentences Figure 1. Difference between two sys- tems 2 Support Vector Machines SVM is a supervised learning algorithm for two- class problems [8]. Training data is given by (x 1 ,y 1 ), ··· , (x u ,y u ), x j R n ,y j ∈{+1, 1}. Here, x j is a feature vector of the j -th sample and y j is its class label, positive (+1) or negative (1). SVM separates positive and negative examples by a hyper- plane given by w · x + b =0, w R n ,b R, (1) In general, such a hyperplane is not unique. The SVM determines the optimal hyperplane by maximiz- ing the margin. The margin is the distance between negative examples and positive examples, i.e., the dis- tance between w · x + b =1 and w · x + b = 1. The examples for w · x + b = ±1 compose what is called the Support Vector, which represents both positive and negative examples. Here, the hyperplane must satisfy the following constraints: