Probability binning comparison: A metric for quantitating multivariate distribution differences

Citation
M. Roederer et al., Probability binning comparison: A metric for quantitating multivariate distribution differences, CYTOMETRY, 45(1), 2001, pp. 47-55
Citations number
9
Categorie Soggetti
Medical Research Diagnosis & Treatment
Journal title
CYTOMETRY
ISSN journal
01964763 → ACNP
Volume
45
Issue
1
Year of publication
2001
Pages
47 - 55
Database
ISI
SICI code
0196-4763(20010901)45:1<47:PBCAMF>2.0.ZU;2-N
Abstract
Background: While several algorithms for the comparison of univariate distr ibutions arising from flow cytometric analyses have been developed and stud ied for many years, algorithms for comparing multivariate distributions rem ain elusive. Such algorithms could be useful for comparing differences betw een samples based on several independent measurements, rather than differen ces based on any single measurement. It is conceivable that distributions c ould be completely distinct in multivariate space, but unresolvable in any combination of univariate histograms. Multivariate comparisons could also b e useful for providing feedback about instrument stability, when only subtl e changes in measurements are occurring. Methods: We apply a variant of Probability Binning, described in the accomp anying article, to multidimensional data. In this approach, hyper-rectangle s of n dimensions (where n is the number of measurements being compared) co mprise the bins used for the chi-squared statistic. These hyper-dimensional bins are constructed such that the control sample has the same number of e vents in each bin; the bins are then applied to the test samples for chi-sq uared calculations. Results: Using a Monte-Carlo simulation, we determined the distribution of chi-squared values obtained by comparing sets of events from the same distr ibution; this distribution of chi-squared values was identical as for the u nivariate algorithm. Hence, the same formulae can be used to construct a me tric, analogous to a t-score, that estimates the probability with which dis tributions are distinct. As for univariate comparisons, this metric scales with the difference between two distributions, and can be used to rank samp les according to similarity to a control. We apply the algorithm to multiva riate immunophenotyping data, and demonstrate that it can be used to discri minate distinct samples and to rank samples according to a biologically-mea ningful difference. Conclusion: Probability binning, as shown here, provides a useful metric fo r determining the probability with which two or more multivariate distribut ions represent distinct sets of data. The metric can be used to identify th e similarity or dissimilarity of samples. Finally, as demonstrated in the a ccompanying paper, the algorithm can be used to gate on events in one sampl e that are different from a control sample, even if those events cannot be distinguished on the basis of any combination of univariate or bivariate di splays. Cytometry 45:47-55, 2001. Published 2001 Wiley-Liss, Inc.dagger.