SMCA03-05-0120 1 Abstract— Many enterprise areas such as marketing, variant design, group technology and cellular manufacturing require their wide variety of products to be organized into families, which are clusters of similar products. In this paper, we propose a similarity metric for finding the distance between existing products based on bills of materials (BOMs), a class of unordered trees. We show that existing editing operations for unordered trees are not consistent for BOMs, and present a similarity metric based on the symmetric difference. We also provide an polynomial time algorithm for finding the minimum weighted symmetric difference between a pair of unordered trees. The results of the pairwise comparisons are used as a distance metric for a clustering algorithm that groups the BOM trees into product families. Index Terms—bills of material, similarity measure, symmetric difference, unordered trees. I. INTRODUCTION Recent manufacturing paradigms like agile manufacturing and globalization have resulted in product proliferation, and mass customization is the order of the day. Consequently, the number of products and part numbers have increased exponentially. At the same time, product development lead times have to be reduced; therefore, companies are eagerly interested in exploiting similarities among the variants, and benefiting as much as possible from previously done work. The historical approach to classification (or grouping) of individual parts into families is the well-known concept of Group Technology (GT) ([1], [2], [3], [4], [5]). The practical acceptance of GT has remained limited due to the enormous effort involved in developing a “coding system” to summarize key design, manufacturing, and other attributes, and translating the legacy part database into this code. This classification and coding process has largely remained manual, although some efforts towards automation have also been made [6]. Today, Data Mining, a growing field, is providing a credible approach to sifting through terabytes of data records to identify meaningful, machine learning-based patterns and relationships between attributes. This paper is focused in the area of new tree mining methods that are applicable to industrial product databases. 1 Manuscript received May 30, 2003. This work was supported by the Engineering Research Program of the Office of Basic Energy Sciences at the Department of Energy and the National Science Foundation under career grant DMI-9624309. Carol J. Romanowski is with the Department of Industrial Engineering, University at Buffalo, Buffalo, NY 14260 USA (e-mail: cfr@buffalo.edu). Rakesh Nagi is with the Department of Industrial Engineering, University at Buffalo, Buffalo, NY 14260 USA (phone: 716-645-2357 ext. 2103; e-mail: nagi@buffalo.edu). While products from different domains such as mechanical, electrical, electronic, civil/infrastructure differ in their key design and manufacturing attributes, a common data type is the bill of materials (BOM). A BOM (also called a recipe, formulation, or specification in other engineering disciplines) is the hierarchical, structured representation of a product, containing critical information such as components, raw materials, quantities, instructions for manufacture, and consumable items [7]. BOMs capture the make-up, content, and structure of complex products from these engineering domains. The major purpose for BOMs is to define the recursive parent-child relationships between the end item, its components or subassemblies, and the raw (or purchased) materials they contain. These relationships provide the data needed to efficiently schedule end items for manufacture and ensure sufficient inventory levels to support their production. BOMs can be depicted as rooted, unordered trees. The end item, or finished product, is the root of the tree; manufactured or assembled components are the nodes; and purchased parts or raw materials are the leaves. Fig. 1 shows an office chair BOM structure as a tree. A. Types of differences in BOMs Different engineers may build completely identical end items with very different BOM structures; since there is no common rule or template to follow, the engineer develops the BOM based on her understanding of how the product is manufactured or assembled. Thus, trees representing otherwise identical end items can have very different topologies, from relatively flat trees (not much different than mere parts lists) to highly structured, multi-level trees. BOM trees may differ in three ways: 1. Structural differences such as the number of intermediate parts, parts at different levels, and parts with different parents. 2. Differences in component labels. 3. Differences in both components and structure. For example, Fig. 2 shows an office chair (A) and a variant On comparing bills of materials: A similarity/distance measure for unordered trees Carol J. Romanowski and Rakesh Nagi 1 Fig. 1: Office chair bill of materials A Office Chair E Seat B Under frame G Upholstery I Back frame F Upholstery G Seat frame D Wheel C Standard J Elbow rest H Back