Extending the Coverage of DBpedia Properties using Distant Supervision over Wikipedia Alessio Palmero Aprosio 1 , Claudio Giuliano 2 , and Alberto Lavelli 2 1 Universit` a degli Studi di Milano, Via Comelico 39/41, 20135 Milano, Italy alessio.palmero@unimi.it 2 Fondazione Bruno Kessler, Via Sommarive 18, 38123 Trento, Italy {giuliano,lavelli}@fbk.eu Abstract. DBpedia is a Semantic Web project aiming to extract structured data from Wikipedia articles. Due to the increasing number of resources linked to it, DBpedia plays a central role in the Linked Open Data community. Currently, the information contained in DBpedia is mainly collected from Wikipedia infoboxes, a set of subject-attribute-value triples that represents a summary of the Wikipedia page. These infoboxes are manually compiled by the Wikipedia contributors, and in more than 50% of the Wikipedia articles the infobox is missing. In this article, we use the distant supervision paradigm to extract the missing information directly from the Wikipedia article, using a Relation Extraction tool trained on the information already present in DBpedia. We evaluate our system on a data set consisting of seven DBpedia properties, demonstrating the suitability of the approach in extending the DBpedia coverage. 1 Introduction Wikipedia is one of the most popular web sites in the world and the most used encyclo- pedia. In addition, Wikipedia is steadily maintained by a community of thousands of active contributors, therefore its content represents a good approximation of what people need and wish to know. Finally, Wikipedia is totally free and it can be downloaded entirely thanks to periodic dumps made available by the Wikipedia community. For these reasons, in the last years several large-scale knowledge bases (KB) have been created exploiting Wikipedia. DBpedia [1], Yago [22] and FreeBase [3] are relevant examples of such resources. In this work, we are particularly interested in DBpedia. 3 Created in 2006, DBpedia has grown in size and popularity, becoming one of the central interlinking hubs of the emerging Web of Data. The approach adopted to build DBpedia is the following. First, the DBpedia project develops and maintains an ontology, available for download in OWL format. Then, this ontology is populated using a rule-based semi-automatic approach that relies on Wikipedia infoboxes, a set of subject-attribute-value triples that represents a summary of some unifying aspect that the Wikipedia articles share. For example, biographical articles typically have a specific infobox (Persondata in the English Wikipedia) containing information such as name, date of birth, nationality, 3 http://www.dbpedia.org/