ITA
ENG

Clustering with exclusion zones: genomic applications

Authors

Segal, Mark R. Xiao, Yuanyuan Huffer, Fred W.

Citation

R. Segal, Mark et al., Clustering with exclusion zones: genomic applications, Biostatistics (Oxford. Print) , 12(2), 2011, pp. 234-246

Journal title

Biostatistics (Oxford. Print) → ACNP

ISSN journal

14654644

Volume

Issue

Year of publication

2011

Pages

234 - 246

Database

ACNP

SICI code

Abstract

Methods for formally evaluating the clustering of events in space or time, notably the scan statistic, have been richly developed and widely applied.In order to utilize the scan statistic and related approaches, it is necessary to know the extent of the spatial or temporal domains wherein the events arise.Implicit in their usage is that these domains have no 'holes'.hereafter 'exclusion zones'.regions in which events a priori cannot occur.However, in many contexts, this requirement is not met.When the exclusion zones are known, it is straightforward to correct the scan statistic for their occurrence by simply adjusting the extent of the domain.Here, we tackle the more ambitious objective of formally evaluating clustering in the presence of 'unknown' exclusion zones.We develop an algorithm for estimating total exclusion zone extent, the quantity needed to correct scan statistic.based inference, using distributional properties of 'spacings', and show how bias correction for this estimator can be effected.Performance of the algorithm is assessed via simulation study.We showcase applications to genomic settings for differing marker (event) types.binding sites, housekeeping genes, and microRNAs.wherein exclusion zones can arise through a variety of mechanisms. In several instances, dramatic changes to unadjusted inference that does not accommodate exclusions are evidenced.