Abstract—Semantic entities are the entities that their concepts are available in a knowledgebase. Here, a new system will be introduced to extract semantic entities from texts. For this aim a new disambiguation method is suggested to match each of ambiguous entity with one of semantic entities in the knowledgebase. The YAGO ontology is used in this method as state of the art of knowledgebase in this field. Since entities in YAGO are meaningful, so in this method, semantic entities are obtained. Comparing the results with the literatures shows that the results of this new approach can be sufficiently reliable. Index Terms—Disambiguation, Information Extraction, Semantic Entity Extraction, YAGO Ontology. I. INTRODUCTION Information Extraction refers to the automatic extraction of structured information such as entities, relationships between entities, and attributes describing entities from unstructured sources such as texts. There are many systems to extract entities from a text. Each system extract their required entities from a text including Stanford named entities [1] and [2] named entities related biomedical [3] and terms in financial domain [4]. But none of them don’t extract semantic entities, so they cannot be used for the applications that need to know semantic of entities such as computing semantic relatedness, semantic search and other works that need to semantic context. For solving this problem, here a new system will be introduced called extracting semantic entities from texts. Semantic entities are the entities that their concepts are available in a knowledgebase. So, by extracting semantic entities from texts, an unstructured text space is converted into a structured semantic space. This extracting is done by a new disambiguation method that using YAGO ontology [5] as its knowledgebase that is a semantic space. Disambiguation is a method that in which main sense of an ambiguous word in a text can be obtained. Disambiguation can be used for various aims. In this paper, this method is used to extract semantic entities from a text by introducing a new disambiguation method. Manuscript received March 27, 2011. This work was supported in part by the Islamic Azad University, Roudsar and Amlash Branch. Farhad Abedini is with the Electrical and Computer Engineering Department, and member of Young Researchers Club, Islamic Azad University, Roudsar and Amlash branch, Roudsar, Iran. Phone: +98-01426215051, e-mail: abedini.ac@gmail.com. Fariborz Mahmoudi is with the Electrical and Computer Engineering Department, Islamic Azad University, Qazvin Branch, Qazvin, Iran. Phone: +9802813665275-3665276-, e-mail: mahmoudi@qiau.ac.ir. Amir Hossein Jadidinejad is with the Electrical and Computer Engineering Department, Islamic Azad University, Qazvin Branch, Qazvin, Iran. Phone: +9802813665275-3665276, e-mail: amir@jadidi.info. A knowledgebase can be the ontology, so the entities that are extracted by ontologies are semantical. Medelyan et al [9] claim the most appropriate work in this field is YAGO ontology. But ontologies only extract entities from structured texts such as infoboxs. In this paper, a new system is introduced to extract semantic entities from unstructured texts using YAGO as its knowledgebase. Each one of previous disambiguation works disambiguate its ambiguous words, using a resource in which ambiguous words meaning and related knowledge are available. This resource is called “background knowledge”. Bunescu et al [6], used encyclopedic knowledge as background knowledge. Mihalcea [7] and Sinha et al [8] used Wikipedia as background knowledge. But Medelyan et al [9] claim the most appropriate work in this field is YAGO ontology. For this reason, YAGO is used as the background knowledge of new disambiguation method. Since YAGO ontology has many semantic entities, so it can help to extract semantic entities from texts as a knowledgebase. In previous works, Wikipedia was the best of background knowledge resource for disambiguation. Using Wikipedia as the background knowledge resource, in addition to its advantages, has two major problems. Firstly, Wikipedia is not completely reliable and then, information of this resource is textual and unstructured. Semantic information can’t easily be extracted from unstructured resources. Suggestion of the present work can solve these problems. For this purpose, it is suggested that, instead of Wikipedia, YAGO ontology be used as background knowledge resource. Since YAGO ontology is obtained from Wikipedia, all its advantages are included. Besides, as YAGO ontology uses WordNet to prove its facts accuracy, so can be relied on. On the other hand, YAGO ontology is a structured knowledgebase, and a set of facts, which can be helpful in easily extracting semantic of entities. Each fact in ontology is as a triple that includes two entities and a relation between them. These triples can be used to extract entities from a text, obtain semantic of those entities. The contributions of this paper are as follows: • Introducing a new method called semantic entity extraction. Here, a new method is introduced to extract semantic entities from an unstructured text. • Introducing a new disambiguation method. To extract semantic entities a new disambiguation method will be introduced that uses new background knowledge, and it will be shown that this background knowledge is state of the art for this paper purpose. • Creating a new application for YAGO ontology. In this paper using YAGO as background knowledge is proposed and it will be shown that this ontology is From Text to Knowledge: Semantic Entity Extraction using YAGO Ontology Farhad Abedini, Fariborz Mahmoudi, and Amir Hossein Jadidinejad International Journal of Machine Learning and Computing, Vol. 1, No. 2, June 2011 113