Mathematical Programming 79 (1997) 163-190 Logical analysis of numerical data 1 Endre Boros a,2, Peter L. Hammer a,*, Toshihide Ibaraki b'3, Alexander Kogan a,c,4 a RUTCOR. Rutgers Umve,wi O, PO. Box 5062, New Brunswick, NJ 0b~903. USA b Department of Applied Mathematics and Physics, Graduate School off Engineering, Kvoto UniversiO; 606 Kyoto. Jnpan c Department oJ Accounting attd ht.[brmntion Systems, Faculty of Management, Rutgers Unive~:~ity, Newark. NJ 07102, USA Received 27 February 1997; accepted 21 April 1997 Abstract "Logical analysis of data" (LAD) is a methodology developed since the late eighties, aimed at discovering hidden structural information in data sets. LAD was originally developed for analyzing binary data by using the theory of partially defined Boolean functions. An extension of LAD for the analysis of numerical data sets is achieved through the process of "binarization" consisting in the replacement of each nmnericat variable by binary "indicator" variables, each showing whether the value of the original variable is above or below a certain level. Binarization was successfully applied to the analysis of a variety of real life data sets. This paper develops the theoretical foundations of the binafization process studying the combinatorial optimization problems related to the minimization of the number of binary variables. To provide an algorithmic framework for the practical solution of such problems, we construct compact linear integer programming formulations of them. We develop polynomial time algorithms for some of these minimization problems, and prove NP-hardness of others. @ 1997 The Mathematical Programming Society, Inc. Published by Elsevier Science B.V. Keywords: Data analysis: Boolean functions; Machine learning; Binarization; Set coveting; Monotonicity; Thresholdness; Computational complexity Corresponding author. Email: hanmmr@rutcor.rutgers.edu. 1The aulhors gra|efully acknowledge the partial support by the Office of Naval Research (grants N00014- 92-J 1375 and N00014-92 44083), 2 Email: boros@rutcor.rutgers,edu. 3 Email: ibaraki@kuamp.kyoto-u.ac.jp. 4 Email: kogan @rutcor.rutgers.edu. 0025-5610/97/$17.00 @ 1997 The Maltmmatical Programming Society, Inc. Published by Elsevier Science B.V. PH S0025-56 10(97)00050-6