Towards an ontology-based retrieval of UML Class Diagrams Karina Robles, Anabel Fraga , Jorge Morato, Juan Llorens Carlos III University of Madrid, Av. Universidad, 30, Leganes, Madrid 28911, Spain article info Article history: Received 10 September 2010 Received in revised form 22 July 2011 Accepted 22 July 2011 Available online 29 July 2011 Keywords: Information retrieval Ontologies Software Reuse Software Engineering UML Class Diagrams abstract Context: Software Reuse has always been an important area amongst software companies in order to increase their productivity and the quality of their products, but code reuse is not the only answer for this. Nowadays, reuse techniques proposals include software designs or even software specifications. Therefore, this research focuses on software design, specifically on UML Class Diagrams. A semantic tech- nology has been applied to facilitate the retrieval process for an effective reuse. Objective: This research proposes an ontology-based retrieval technique by semantic similarity in order to support effective retrieval process for UML Class Diagrams. Since UML Class Diagrams are a de facto standard in the design stages of a Software Development Process, a good technique is needed to reuse them, i.e. reusing during the design stage instead of just the coding stages. Method: An application ontology modeled using UML specifications was designed to compare UML Class Diagram element types. To measure their similarity, a survey was conducted amongst UML experts. Query expansion was improved by a domain ontology supporting the retrieval phase. The calculus of minimal distances in ontologies was solved using a shortest path algorithm. Results: The case study shows the domain ontology importance in the UML Class Diagram retrieval pro- cess as well as the importance of an element type expansion method, such as an application ontology. A correlation between the query complexity and retrieved elements has been identified, by analyzing results. Finally, a positive Return of Investment (ROI) was estimated using Poulin’s Model. Conclusion: Because Software Reuse has not to be limited to the coding stage, approaches to reuse design stage must be developed, i.e. UML Class Diagrams reuse. This approach proposes a technique for UML Class Diagrams retrieval, which is one important step towards reuse. Semantic technology combined with information retrieval improves the retrieval results. Ó 2011 Elsevier B.V. All rights reserved. 1. Introduction Software companies have always been seeking optimal soft- ware development methods, reducing costs and development times. However, new software products become more and more complex as time goes by and this obstructs the rapid production of quality software impeding the achievement of their goals. Soft- ware Reuse has emerged as an answer to this need [54,36,35]. Although, authors in this area converge to the idea that, despite their effort, expected results are not accomplished [50]. It is not only reusing components developed for reuse [28], but also reuse any asset generated during a Software Development Process [1]. These assets represent a knowledge that should not be lost and, furthermore, they must be available for other software projects [7,68,2,62]. This research focuses on the design stage, because it has an enor- mous influence over the development stage, and companies should take advantage of this fact, reusing previous experience of past projects [3,7,68,2,59]. The reuse of people’s experience prevents rep- etition of work done and, ultimately extra costs. Therefore, if compa- nies could store and retrieve their knowledge effectively from the beginning of the software lifecycle, it could be possible to improve the Software Development Process (SDP) [50]. Nevertheless, most of the widespread tools for indexing and searching in the market are quite generic and are only based on code search (i.e. Google Code Search) or component search, which usually are based on keywords. There is a lack of specialized tools for retrieval during the design phase, because it is difficult to ab- stract and represent the knowledge produced in this phase. How- ever, it is known that this knowledge is represented by models in software projects [7]. In particular, UML Class Diagrams 1 are an important static representation of many software projects. 0950-5849/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.infsof.2011.07.003 Corresponding author. Tel.: +34 916249115. E-mail address: afraga@inf.uc3m.es (A. Fraga). 1 The Class Diagrams are structured diagrams, showing classes, attributes and relationships of the system to be built. Information and Software Technology 54 (2012) 72–86 Contents lists available at ScienceDirect Information and Software Technology journal homepage: www.elsevier.com/locate/infsof