Concept Map Mining: A definition and a framework for its evaluation Jorge J. Villalon School of Elec. & Inf. Engineering University of Sydney villalon@ee.usyd.edu.au Rafael A. Calvo School of Elec. & Inf. Engineering University of Sydney rafa@ee.usyd.edu.au Abstract Concept maps are visual representations of knowledge, widely used in educational contexts. We use the term ”Con- cept Map Mining” (CMM) to refer to the automatic extrac- tion of Concept Maps from documents such as essays. The principles behind CMM have been proposed for applica- tions such as: information extraction in specific knowledge domains, the measurement of student understanding and misconceptions based on written essays, and as a prelim- inary step to creating domain ontologies. Previous work on the automatic extraction of concept maps present two problems: 1) overly simplistic and vary- ing definitions of concept maps, and 2) the lack of an eval- uation framework that can be used to measure the quality of the generated maps. In this paper, we propose a formal definition of the term CMM, with a focus on educational ap- plications. We also propose an evaluation framework that will allow other researchers to share a common ground to evaluate the performance of CMM methods. 1 Introduction Concept Maps (CMs) have been widely used in educa- tion as a way to represent students’ knowledge [11]. They comprise concepts and their relationships, arranged hierar- chically according to the importance of the concepts de- scribed. Several studies have shown that concept maps are a valid and reliable medium to represent students’ un- derstanding [5, 10], making them a valuable pedagogical tool. CMs are generally used in educational scenarios in two ways: 1) Students build a CM following a focus ques- tion, or 2) students analyze a previously built CM (usually expert-built). Both approaches have been found to improve student learning outcomes. Another method for allowing students to demonstrate their understanding of a topic, is to ask them to write a document or academic essay. Essays are considered among the most representative source of student understanding [4]. Educational researchers have shown that writing is a task in which higher cognitive functions, such as analysis and syn- thesis, can be fully developed [4]. CMs have been used as a way to support the process of writing, by providing students with a tool to organize their knowledge before writing the essay. The automatic extraction of CMs from text, or what we are calling ”Concept Map Mining” (CMM), could provide new means to use the student knowledge elicited by essays. CMs represent semi-structured information which allows computers to process them in many ways, (for example, by calculating distances between CMs, or identifying sets of related concepts, and/or propositions). In this way, teachers could use CMM results to search for common propositions (a triple concept - relationship - concept) and misconcep- tions among students, or to more easily group students with similar levels of knowledge. The idea of automatically extracting concept maps has been proposed before, but previous studies expose chal- lenges at two levels: An inconsistent definition of concept maps, and a lack of an evaluation framework. The first prob- lem occurs when authors claim to be creating CMs, but are actually creating what might be better described as semantic networks, with their own unique characteristics, and which are not always suitable for educational purposes. The sec- ond problem is that each study uses a different method to evaluate its contribution, making it very hard to compare studies. Despite these issues the automatic construction of CM, with some caveats, is becoming possible. Due to their importance in education, the impact of such results could be considerable. In Section 2 and 3 we discuss previous studies related to CMM which have contributed to our proposed formal definition for CMM. In Section 4 we present an evaluation framework, and the design of a tool to support the creation of gold standards for CMM. In 4.1 propose a technical im- plementation. In Section 5 we conclude.