Comparative genomic hybridization (CGH) is a laboratory method to measure g
ains and losses in the copy number of chromosomal regions in tumor cells. I
t is hypothesized that certain DNA gains and losses are related to cancer p
rogression and that the patterns of these changes are relevant to the clini
cal consequences of the cancer. It is therefore of interest to develop mode
ls which predict the occurrence of these events, as well as techniques for
learning such models from CGH data. We continue our study of the mathematic
al foundations for inferring a model of tumor progression from a CGH data s
et that we started in Desper et al, (1999). In that paper, we proposed a cl
ass of probabilistic tree models and showed that an algorithm based on maxi
mum-weight branching in a graph correctly infers the topology of the tree,
under plausible assumptions. In this paper, we extend that work in the dire
ction of the so-called distance-based trees, in which events are leaves of
the tree, in the style of models common in phylogenetics, Then we show how
to reconstruct the distance-based trees using tree-fitting algorithms devel
oped by researchers in phylogenetics, The main advantages of the distance-b
ased models are that 1) they represent information about co-occurrences of
all pairs of events, instead of just some pairs, 2) they allow quantitative
predictions about which events occur early in tumor progression, and 3) th
ey bring into play the extensive methodology and software developed in the
context of phylogenetics, We illustrate the distance-based tree method and
how it complements the branching tree method, with a CGH data set for renal
cancer.