I-Y. Song, W.Rowen, C. Medsker, E. Ewen 6-1 An Analysis of Many-to-Many Relationships Between Fact and Dimension Tables in Dimensional Modeling Abstract Star schema, which maintains one-to-many relationships between dimensions and a fact table, is widely accepted as the most viable data representation for dimensional analysis. Real- world DW schema, however, frequently includes many-to-many relationships between a dimension and a fact table. Having those relationships in a dimensional model causes several difficult issues, such as losing the simplicity of the star schema structure, increasing complexity in forming queries, and degrading query performance by adding more joins. Therefore, it is desirable to represent the many-to-many relationships with correct semantics while still keeping the structure of the star schema. In this paper, we analyze many-to-many relationships between a dimension table and a fact table in dimensional modeling. We illustrate six different approaches and show the advantages and disadvantages of each. We propose two ad-hoc methods that maintain a star schema structure by denormalizing the dimensions to avoid many-to- many relationships. This method allows quick query processing by using a concatenated attribute with minimal overhead. Other issues addressed are data redundancy, weighting factors, storage requirements, and performance concerns. 1. Introduction The data warehouse (DW) is an integrated repository of data, generated and used by an entire organization. The data warehouse employs a suite of tools that transforms raw data into meaningful business information. This information depicts a view of a distinct business process to identify trends and patterns and serves as a foundation for decision-making. The dimensional model is a logical representation of a business process whose significant features are user understandability, query performance, and resilience to change. Dimensional modeling is widely accepted as the viable technique for delivering data to end users in a data warehouse [KRRT98, AM97, AV98, AS97, DSHB98, MC98]. The main components of a dimensional model are fact tables and dimension tables. A fact table contains measurements of the business or records events. A dimension table contains attributes used to constrain, group, or browse the fact data. There are two primary advantages of using a dimensional model in data warehouse environments. First, a dimensional model provides a multidimensional analysis space in relational database environments; we are analyzing factual data using dimensions. Second, a typical denormalized dimensional model has a simple schema structure, which simplifies end- user query processing and improves performance. The dimension tables contain a large number of attributes, reflecting the details of the business processes. Browsing is a user activity that explores the relationships between attributes in a dimension table. The attributes will serve as row headers and constraints for these views. It is common to have more than one hundred attributes in a real world application. Dimension tables are considered wide for this reason. Denormalization of dimension tables is an acceptable practice in data warehousing. A dimensional model with highly normalized dimension structure is called William Rowen College of Information Science and Technology Drexel University Philadelphia, PA 19104 msis@drexel.edu Il -Yeol Song College of Information Science and Technology Drexel University Philadelphia, PA 19104 songiy@drexel.edu Carl Medsker Arynth, Inc. Cinnaminson, NJ 08077 cmedsker@arynth.com Edward Ewen, M.D Christiana Care Health System Wilmington, DE 19899 eewen@christianacare.org The copyright of this paper belongs to the paper’s authors. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage. Proceedings of the International Workshop on Design and Management of Data Warehouses (DMDW'2001) Interlaken, Switzerland, June 4, 2001 (D. Theodoratos, J. Hammer, M. Jeusfeld, M. Staudt, eds.) http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-39/