A. Banks Pidduck et al. (Eds.): CAISE 2002, LNCS 2348, pp. 782-786, 2002. Springer-Verlag Berlin Heidelberg 2002 On the Logical Modeling of ETL Processes Panos Vassiliadis, Alkis Simitsis, and Spiros Skiadopoulos National Technical University of Athens, Dept. of Electrical and Computer Eng. Computer Science Division, Iroon Polytechniou 9, 157 73, Athens, Greece {pvassil,asimi,spiros}@dbnet.ece.ntua.gr 1 Introduction Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. Research has only recently dealt with the above problem and provided few models, tools and techniques to address the issues around the ETL environment [1,2,3,5]. In this paper, we present a logical model for ETL processes. The proposed model is characterized by several templates, representing frequently used ETL activities along with their semantics and their interconnection. In the full version of the paper [4] we present more details on the aforementioned issues and complement them with results on the characterization of the content of the involved data stores after the execution of an ETL scenario and impact-analysis results in the presence of changes. 2 Logical Model Our logical model abstracts from the technicalities of monitoring, scheduling and logging while it concentrates (a) on the flow of data from the sources towards the data warehouse and (b) on the composition of the activities and the derived semantics. Elementary Activity Not Null Selection Aggregate myNot Null mySelection Metamodel layer Template layer Schema & Scenario RecordSet Supplier LineItem PartSupp ISA IN myScenario Fig. 1 The metamodel for the logical entities of the ETL environment Activities are the backbone of the structure of any information system. In our framework, activities are logical abstractions representing parts, or full modules of