Effectiveness of Keyword and Semantic Relation Extraction for Knowledge Map Generation Virach Sornlertlamvanich 1 and Canasai Kruengkrai 2 1 Sirindhorn International Institute of Technology, Thammasat University, Thailand virach@siit.tu.ac.th 2 Graduate School of Information Sciences, Tohoku University, Japan canasai@ecei.tohoku.ac.jp Abstract. We explore the named entity (NE) recognition and semantic relation extraction technique on the Thai cultural database. Within the limited domain and well-structured database, our proposed method can perform in an accept- able high accuracy to generate the tuples of semantic relation for expressing the essence of the record in terms of infobox and knowledge map. In this paper, we propose a semantic relation extraction approach based on simple relation tem- plates that determine relation types and their arguments. We attempt to reduce semantic drift of the arguments by using named entity models as semantic con- straints. Experimental results indicate that our approach is very promising. We successfully apply our approach to a cultural database and discover more than 18,000 relation instances with expected high accuracy. Keywords: named entity extraction, semantic relation extraction, cultural data- base, infobox, knowledge map 1 Introduction Targeting on the user generated content (UGC) e.g. Thai Cultural Information Center website, 1 we are interested in relating the document units semantically to generate a network that can express in a knowledge map manner. In our approach, we focus on keyword and semantic relation extraction. Some language dependent problems have to be solved especially in handling the Thai language, which has no word delimiter or punctuation mark. We apply general tools for word segmentation and POS tagging, then extract the keyword according to the model trained from named entity (NE) tagged corpus. The size of this cultural database has gradually increased to around 100,000 re- cords (from November 2010 to December 2014). Each record contains a number of fields describing a specific cultural object. The content includes four main compo- nents: (1) cover image and thumbnails, (2) title, (3) description and (4) domain. We need to extract facts (hereafter referred to as relation instances) from the description. 1 http://www.m-culture.in.th/