Clustering with exclusion zones: genomic applications

Citation
R. Segal, Mark et al., Clustering with exclusion zones: genomic applications, Biostatistics (Oxford. Print) , 12(2), 2011, pp. 234-246
ISSN journal
14654644
Volume
12
Issue
2
Year of publication
2011
Pages
234 - 246
Database
ACNP
SICI code
Abstract
Methods for formally evaluating the clustering of events in space or time, notably the scan statistic, have been richly developed and widely applied.In order to utilize the scan statistic and related approaches, it is necessary to know the extent of the spatial or temporal domains wherein the events arise.Implicit in their usage is that these domains have no 'holes'.hereafter 'exclusion zones'.regions in which events a priori cannot occur.However, in many contexts, this requirement is not met.When the exclusion zones are known, it is straightforward to correct the scan statistic for their occurrence by simply adjusting the extent of the domain.Here, we tackle the more ambitious objective of formally evaluating clustering in the presence of 'unknown' exclusion zones.We develop an algorithm for estimating total exclusion zone extent, the quantity needed to correct scan statistic.based inference, using distributional properties of 'spacings', and show how bias correction for this estimator can be effected.Performance of the algorithm is assessed via simulation study.We showcase applications to genomic settings for differing marker (event) types.binding sites, housekeeping genes, and microRNAs.wherein exclusion zones can arise through a variety of mechanisms. In several instances, dramatic changes to unadjusted inference that does not accommodate exclusions are evidenced.