IRML – INFORMATION RETRIEVAL MODELING LANGUAGE João Ferreira 1 , Alberto Silva 2 , and José Delgado 3 1 ISEL, 2 INESC-ID, 2,3 Instituto Superior Técnico 1 jferreira@deetc.isel.ipl.pt 2 alberto.silva@acm.org 3 Jose.Delgado@tagus.ist.utl.pt Keywords: Modeling Language, Information Retrieval. Abstract: We propose a specific language (created for the IR area) that provides a common notation and concepts for the design of IR systems. The language is based on UML extension mechanisms with specific stereotypes for IR. From this language (UML Profile) we define standard libraries of models and code templates that can be used in the development of IR systems. The main goal is to provide a novel approach that can guide the design of IR systems, using a common notation and concepts in a modular environment. 1 INTRODUCTION Information Retrieval (IR) has mainly developed during the last four decades based on algorithms and methods. Personalization of Web retrieval applications is mandatory due to the follow requirements: (1) user specific information needs; (2) communication (e.g. mainly the rise of wireless devices); (3) geographic location. Personalization means new IR applications build based on previous ones or even in the optimization. There is also a need for a common language, that can provide a common notation, uniform concepts, a baseline to formulate problems in the IR community and also for the reuse of IR software. To fulfill this gap, we propose in this paper: (1) the IRML, a language based on UML extension mechanisms focused for the IR area; and a set of (2) IR-Models, derived from an IR-Language, which provides standard libraries for these kind of systems. These two points are part of an ambitious project concerned with automatic generation of IR systems through models and appropriate templates (figure 1). The benefits of this process are significant, namely: (1) it facilitates the IR-System building process (e.g. increasing customization possibilities); (2) this standard, if well accepted, it promotes a collaborative environment within the IR Community, originating and supporting faster development. This approach also provides tools to facilitate changes in IR systems and the simultaneous development of IR systems by different teams. Each team starts to create models and templates for code generation based on a common IRML language and users or researchers define IR systems based on models, later transformed to specific programming languages and or platforms. Development Team Templates Models IRML IR-users / IR- Investigator «IR-System» IR-Personalized System Code Generation produce u se choose based on creates creates Figure 1: New approach to building IR systems. 2 IRML OVERVIEW OMG proposes a new software development paradigm knows as Model Driven Architecture (MDA) [1]. MDA is strongly based on UML2 which is based on the following main principles: modularity, layer division and extensibility. This new version goes in line with the main objective of an IR-Language. This subject can be explored at [2, 3, 4]. Based on UML,