Rank-Cluster-and-Prune: An Algorithm for Generating Clusters in Complex Set Partitioning Problems Amy Cohn, 1 Michael Magazine, 2 George Polak 3 1 Department of Industrial and Operations Engineering, College of Engineering, University of Michigan, Ann Arbor, Michigan 48109-2117 2 Quantitative Analysis and Operations Management, College of Business, University of Cincinnati, Cincinnati, Ohio 45221 3 Department of Information Systems and Operations Management, Raj Soin College of Business, Wright State University, Dayton, Ohio 45435 Received 10 July 2007; revised 4 November 2008; accepted 22 November 2008 DOI 10.1002/nav.20343 Published online 24 February 2009 in Wiley InterScience (www.interscience.wiley.com). Abstract: Clustering problems are often difficult to solve due to nonlinear cost functions and complicating constraints. Set parti- tioning formulations can help overcome these challenges, but at the cost of a very large number of variables. Therefore, techniques such as delayed column generation must be used to solve these large integer programs. The underlying pricing problem can suffer from the same challenges (non-linear cost, complicating constraints) as the original problem, however, making a mathe- matical programming approach intractable. Motivated by a real-world problem in printed circuit board (PCB) manufacturing, we develop a search-based algorithm (Rank-Cluster-and-Prune) as an alternative, present computational results for the PCB problem to demonstrate the tractability of our approach, and identify a broader class of clustering problems for which this approach can be used. © 2009 Wiley Periodicals, Inc. Naval Research Logistics 56: 215–225, 2009 Keywords: set partitioning; branch-and-price; delayed column generation; branch-and-bound 1. INTRODUCTION Clustering problems, in which a group of objects must be divided into nonoverlapping and exhaustive subsets, appear in a wide variety of applications, ranging from transportation (e.g. [5]) to manufacturing (e.g. [29]) to scheduling MBA cohorts (e.g. [15]). When the cost function and/or the rules governing the feasibility of subsets are complex, a set parti- tioning model can often be formulated to avoid a nonlinear objective function and/or complicating constraints. Unfortunately, such formulations typically possess an exponential number of integer variables. Very large integer programs can sometimes be solved with branch-and-price, an application-customized algorithm that uses delayed column generation as a way to solve the large-scale linear programs embedded within the branch-and-bound tree. Column gen- eration, however, requires the repeated solving of a pricing problem to identify candidate variables with negative reduced cost. [These techniques are briefly summarized in the next section.] When a set partitioning formulation is used as a Correspondence to: A. Cohn (amycohn@umich.edu) way to bypass complex constraints and objective functions, this complexity must instead be addressed in the pricing prob- lem. Thus, mathematical programming (MP) approaches are often inadequate for solving this pricing problem. This was our experience in attempting to solve a real-world problem in integrated printed circuit board (PCB) planning. Motivated by this application, we have developed an alter- native approach to the pricing problem, which we call Rank- Cluster-and-Prune (RCP). RCP is a search-based technique that, like branch-and-bound, uses a tree structure to enumer- ate potential solutions. Rather than using linear programming to construct the nodes, however, we make inclusion deci- sions in an ordered way, allowing us to directly compute the objective function. This is very powerful, as it enables us to consider problems with a wide range of objective functions. They need not be linear or convex. In fact, it is not even nec- essary that we be able to write the objective function in closed form. For example, we might compute it using Monte Carlo simulation or a look-up table. The only restriction is that it be nondecreasing in inclusion (i.e. when we add to a set it’s cost does not go down). Pruning based on dual potentials prevents the exhaustive enumeration of the solution space © 2009 Wiley Periodicals, Inc.