''Logical analysis of data'' (LAD) is a methodology developed since th
e late eighties, aimed at discovering hidden structural information in
data sets. LAD was originally developed for analyzing binary data by
using the theory of partially defined Boolean functions, An extension
of LAD for the analysis of numerical data sets is achieved through the
process of ''binarization'' consisting in the replacement of each num
erical variable by binary ''indicator'' variables, each showing whethe
r the value of the original variable is above or below a certain level
. Binarization was successfully applied to the analysis of a variety o
f real life data sets, This paper develops the theoretical foundations
of the binarization process studying the combinatorial optimization p
roblems related to the minimization of the number of binary variables,
To provide an algorithmic framework for the practical solution of suc
h problems, we construct compact linear integer programming formulatio
ns of them. We develop polynomial time algorithms for some of these mi
nimization problems, and prove NP-hardness of others, (C) 1997 The Mat
hematical Programming Society, Inc. Published by Elsevier Science B.V.