The term data augmentation refers to methods for constructing iterative opt
imization or sampling algorithms via the introduction of unobserved data or
latent variables. For deterministic algorithms, the method was popularized
in the general statistical community by the seminal article by Dempster, L
aird, and Rubin on the EM algorithm for maximizing a likelihood function or
, more generally, a posterior density. For stochastic algorithms, the metho
d was popularized in the statistical literature by Tanner and Wong's Data A
ugmentation algorithm for posterior sampling and in the physics Literature
by Swendsen and Wang's algorithm for sampling from the Ising and Potts mode
ls and their generalizations; in the physics literature, the method of data
augmentation is referred to as the method of auxiliary variables. Data aug
mentation schemes were used by Tanner and Wong to make simulation feasible
and simple, while auxiliary variables were adopted by Swendsen and Wang to
improve the speed of iterative simulation. In general, however, constructin
g data augmentation schemes that result in both simple and fast algorithms
is a matter of art in that successful strategies vary greatly with the (obs
erved-data) models being considered. After an overview of data augmentation
/auxiliary variables and some recent developments in methods for constructi
ng such efficient data augmentation schemes, we introduce an effective sear
ch strategy that combines the ideas of marginal augmentation and conditiona
l augmentation, together with a deterministic approximation method for sele
cting good augmentation schemes. We then apply this strategy to three commo
n classes of models (specifically, multivariate t, probit regression, and m
ixed-effects models) to obtain efficient Markov chain Monte Carlo algorithm
s for posterior sampling. We provide theoretical and empirical evidence tha
t the resulting algorithms, while requiring similar programming effort, can
show dramatic improvement over the Gibbs samplers commonly used for these
models in practice. A key feature of all these new algorithms is that they
are positive recurrent subchains of nonpositive recurrent Markov chains con
structed in larger spaces.