In this paper we develop a methodology for the identification of large
numbers of u.S. adult twin pairs. Data for this study derive from the
U.S. Department of Defense and the Vietnam Era Twin (VET) Registry. T
he Department of Defense identified potential male twins (n = 10,002)
using a computerized record linkage algorithm based on the same last n
ame, same date of birth, and the same first five digits of the Social
Security number. Twinship was confirmed by comparison with the Vietnam
Era Twin Registry. We developed a logistic regression model that pred
icts the probability that a paired record identifies twins based on th
e absolute difference in the last four digits in the Social Security n
umber, the age of issuance of the Social Security number, and the freq
uency of occurrence of the last name. We used the estimated coefficien
ts derived from this regression model to assign predicted probabilitie
s of being a twin to each matched record. There is a close corresponde
nce between the observed and expected number of twins when evaluated a
cross deciles of predicted probabilities of being a twin; the value of
the Harrell's c index (c = 0.68 +/- 0.0004) indicates the overall pre
dictive accuracy of the regression equation. The results from this stu
dy demonstrate the feasibility of identifying adult male-male twin pai
rs from any large computerized database that contains name, date of bi
rth and Social Security number. However, the selection criteria used i
n the creation of the computer database must be clearly specified to a
void constructing a biased sample of twins.