Data Mining in Maintenance of Electronic Component Libraries Esa Alhoniemi * , Timo Knuutila * , Mika Johnsson , Juha R¨ oyhki¨ o and Olli S. Nevalainen * * University of Turku Department of Information Technology Lemmink¨ aisenkatu 14 A FI-20520 Turku, Finland Valor Computerized Systems (Finland) Oy Ruukinkatu 2 FI-20540 Turku, Finland Abstract— In this study 1 , adding data of new components to an existing electronic component library in considered. The suggested approach uses a particular data mining algorithm to support interactive input of the data. The basic idea is to compute association rules between the attributes of the existing components in the library. The rules can then be used to ease the input of the attributes of a new component. The scheme is general in the sense that the same approach can be easily used in other similar applications as well. We first introduce the necessary basic concepts of the association rules and then illustrate the application of the suggested approach using a fraction of a real component library. I. I NTRODUCTION Successful operation of a printed circuit board (PCB) as- sembly robot requires three things: a numerical control (NC) program, an electronic component library, and the configura- tion data of the machine. In the assembly of a new product using a PCB robot, generation of a new NC program is usually quite straightforward. The machine configuration data needs to be changed seldom and is therefore often not a problem. A laborious task in the assembly of a new product is the maintenance of the electronic component library, which is considered in this article. In the library, each component is characterized by dozens or even hundreds of attributes, such as the dimensions of the component, nozzles, vision data, handling speeds, polarity, and feeders. There exist both machine independent and dependent attributes. The machine independent attributes can be directly obtained from some external source like a CAD library or Valor parts library 2 . In the machine dependent data, there are several attributes the values of which depend on the type of a particular machine and some even on a specific machine. The reason for this is that the values of some attributes are may depend e.g. on the physical environment, like the lighting con- ditions of the machine. Generation of the machine-specific data turns out to be the most laborious task when the assembly of a new product is initiated on a certain machine. Traditionally, 1 This work was partially supported by the Academy of Finland, Grant 104795. 2 The library contains data of about 30–40 millions of components, see www.valor.com for more details. experience of the human operators, manual browsing through the specification documents, and testing by the machine using trial and error is required. The novel approach suggested in this paper – which utilizes the information in the existing component libraries – does not eliminate all the manual work, but provides a faster semiautomated procedure for the generation of the data. Even though the component library data has a very complex logical structure, it is possible to ignore without loss of generality the details concerning the structure of the data. From now on, the data of the component library is seen as a table, the rows of which correspond to the components and the columns to the different component attributes. Each time a previously unused component type is included in the assembly by a machine, its attributes have to be fed into the library which requires a large amount of manual checking. This means adding a new row in the data table, which may amount up to 100 rows. The main contribution of this study is a novel data min- ing [1] approach to support a human operator to fill in or check the correctness of the attributes for a new component attributed to some component placement machine. The basic idea is to use the existing component libraries to construct a set of so-called association rules, which describe dependencies between values of different attributes. The rules can then be used either to predict the value of an unknown attribute based on the so far recorded ones, or to detect potentially erroneous user input. This is possible due to the fact that the component attributes are not completely independent of each other, but the library contains much redundant information. The redundancy could be removed by clever data structures and by adding dependencies to the library. However, these kind of solutions do not work due to the highly dynamic nature and complex rules of the component attributes. The goal of this study is to demonstrate the use of the suggested data mining approach and to preliminarily evaluate its feasibility using real data. Determination of the so-called large (or frequent) itemsets (see for example [1, pp. 429– 433]) – which is an essential part of the computation of the association rules – is carried out using the well-known data