ITA
ENG

Sample size requirements for training high-dimensional risk predictors

Authors

Dobbin, Kevin K. Song, Xiao

Citation

K. Dobbin, Kevin et Song, Xiao, Sample size requirements for training high-dimensional risk predictors, Biostatistics (Oxford. Print) , 14(4), 2013, pp. 639-352

Journal title

Biostatistics (Oxford. Print) → ACNP

ISSN journal

14654644

Volume

Issue

Year of publication

2013

Pages

639 - 352

Database

ACNP

SICI code

Abstract

A common objective of biomarker studies is to develop a predictor of patient survival outcome.Determining the number of samples required to train a predictor from survival data is important for designing such studies.Existing sample size methods for training studies use parametric models for the high-dimensional data and cannot handle a right-censored dependent variable.We present a new training sample size method that is non-parametric with respect to the high-dimensional vectors, and is developed for a right-censored response.The method can be applied to any prediction algorithm that satisfies a set of conditions.The sample size is chosen so that the expected performance of the predictor is within a user-defined tolerance of optimal.The central method is based on a pilot dataset.To quantify uncertainty, a method to construct a confidence interval for the tolerance is developed.Adequacy of the size of the pilot dataset is discussed.An alternative model-based version of our method for estimating the tolerance when no adequate pilot dataset is available is presented.The model-based method requires a covariance matrix be specified, but we show that the identity covariance matrix provides adequate sample size when the user specifies three key quantities.Application of the sample size method to two microarray datasets is discussed.