M. Bouzeghoub et al. (Eds.): NLDB 2000, LNCS 1959, pp. 276-287, 2001. Springer-Verlag Berlin Heidelberg 2001 On the Automatization of Database Conceptual Modelling through Linguistic Engineering Paloma Martnez 1 and Ana Garca-Serrano 2 1 University Carlos III of Madrid, Department of Computer Science, Avda. Universidad 30, 28911 LeganØs, Madrid, Spain pmf@inf.uc3m.es 2 Technical University of Madrid, Department of Artificial Intelligence, ISYS Group, Campus de Montegancedo S/N, 28660, Madrid, Spain agarcia@dia.fi.upm.es Abstract. The aim of this paper is to show an approach to database (DB) conceptual modelling that takes advantage of lexical knowledge (morphologic, syntactic and semantic) in order to (semi) automatically interpret a textual description of an Universe of Discourse (UoD) and to propose a feasible data conceptual schema according to the natural language (NL) description. Main contributions of the present work are: definition of several linguistic perspectives based on syntactic and semantic clues that help to acquire Extended Entity Relationship (EER) conceptual schemata from textual specifications, specification of a grammar for the EER conceptual model, as well as a set of correspondence rules among linguistic concepts and the EER model constructors. 1 Objectives and Motivation This work is part of a research framework 1 devoted to DB conceptual modelling that integrates various knowledge sources and technologies for helping novice DB analysts and students in different specific DB development tasks using methodological guides (for example, definition of EER conceptual schemata, transformation of conceptual schemata into relational schemata, automatic generation of SQL-92 code and other functionalities). One of the aims of this project is to cover some lacking features in current CASE tools for the overall coverage of DB life cycle, especially in requirements analysis phase, as well as the absence of methodological assistants, that may show what the steps to be followed in DB development are. In practice, requirements elicitation and collection is mainly done using NL. Thus, it is reasonable to search for methods for systematic treatment of specifications. A conceptual schema, independently of data formalism used, plays two main roles in the 1 This work takes part of the CICYT project PANDORA (CASE Platform for Database development and learning via Internet) TIC99-0215 and CAM project MESIA 07T/0017/1998.