An aggregation heuristic for large scale p-median problem Pasquale Avella a , Maurizio Boccia a , Saverio Salerno b , Igor Vasilyev c,n a Dipartimento di Ingegneria, Universit a del Sannio, Viale Traiano, 82100 Benevento, Italy b Dipartimento di Ingegneria dell’Informazione e Matematica Applicata, Universit a di Salerno, via Ponte don Melillo, 84084 Fisciano (SA), Italy c Institute of System Dynamics and Control Theory, Siberian Branch of Russian Academy of Sciences, Lermontov Srt., 134, 664033 Irkutsk, Russia article info Available online 28 September 2011 Keywords: p-Median problem Clustering analysis Lagrangean relaxation Core heuristic Aggregation procedure abstract The p-median problem (PMP) consists of locating p facilities (medians) in order to minimize the sum of distances from each client to the nearest facility. The interest in the large-scale PMP arises from applications in cluster analysis, where a set of patterns has to be partitioned into subsets (clusters) on the base of similarity. In this paper we introduce a new heuristic for large-scale PMP instances, based on Lagrangean relaxation. It consists of three main components: subgradient column generation, combining sub- gradient optimization with column generation; a ‘‘core’’ heuristic, which computes an upper bound by solving a reduced problem defined by a subset of the original variables chosen on a base of Lagrangean reduced costs; and an aggregation procedure that defines reduced size instances by aggregating together clients with the facilities. Computational results show that the proposed heuristic is able to compute good quality lower and upper bounds for instances up to 90,000 clients and potential facilities. & 2011 Elsevier Ltd. All rights reserved. 1. Introduction Given a set I ¼f1, ... , mg of potential locations of p facilities, a set J ¼f1, ... , ng of clients, and d ij ðdði, jÞÞgiven distances (trans- portation costs) between the location i and the client j. The p-median problem (PMP) consists of locating p facilities (med- ians) at locations of I in order to minimize the sum of distances from each client to the nearest facility. The combinatorial opti- mization formulation of PMP takes the following form: Z n ¼ min T I X j A J min t A T d tj : 9T 9 ¼ p 8 < : 9 = ; : The problem was introduced by Hakimi [11,12] and is known to be NP-hard [21] (see [24] for a more general survey on discrete location problems). In many applications we have I ¼ J and we can define the problem on the weighted directed graph GðI, AÞ, where I is the vertex set, A is the arc set, and weights d ij are associated with the arcs ij A A. Let y i be a binary variable which is 1 if i is a median, 0 otherwise, and x ij a binary variable which is 1 if the median i is nearest from the vertex j, 0 otherwise. Let also d ðjÞ be the set of the arcs entering the vertex j. Then a mixed integer programming (MIP) model of PMP over the graph GðI, AÞ is Z n ¼ min ðx, yÞ X ij A A d ij x ij , ð1Þ X i A d ðjÞ x ij þ y j ¼ 1 8j A I, ð2Þ x ij ry i 8ij A A, ð3Þ X i A I y i ¼ p, ð4Þ y i A f0; 1g 8i A I, ð5Þ x ij A f0; 1g 8ij A A: ð6Þ Constraints (2) ensure that either j is a median or it must be assigned to a median. Variable upper bound (VUB) constraints (3) impose that a vertex can only be assigned to medians. Constraint (4) enforces the number of medians to be p. A feasible solution of the problem consists of p ‘‘stars’’ where medians have leaving arcs as shown in Fig. 1. An interesting application of large-scale PMP arises in cluster analysis [33,28,26,15,32,14]. Cluster analysis consists of partition- ing a set of patterns into subsets (clusters) based on similarity, i.e. a cluster has to contain the similar patterns and dissimilar patterns have to be in different clusters. Each pattern is usually expressed by a multidimensional vector, called ‘‘feature vector’’, and the dissimilarity between two patterns is measured as the distance between the two Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/caor Computers & Operations Research 0305-0548/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.cor.2011.09.016 n Corresponding author. Tel.: þ7 9148752836; fax: þ7 3952511616. E-mail addresses: avella@unisannio.it (P. Avella), maurizio.boccia@unisannio.it (M. Boccia), salerno@unisa.it (S. Salerno), vil@icc.ru (I. Vasilyev). Computers & Operations Research 39 (2012) 1625–1632