Empirical evaluations are needed to determine which users are helped or hin
dered by user-adapted interaction in user modeling systems. A review of pas
t UMUAI articles reveals insufficient empirical evaluations, but an encoura
ging upward trend. Rules of thumb for experimental design, useful tests for
covariates, and common threats to experimental validity are presented. Rep
orting standards including effect size and power are proposed.