We present two-stage experiment designs for use in simulation experiments t
hat compare systems in terms of their expected (long-run average) performan
ce. These procedures simultaneously achieve the following with a prespecifi
ed probability of being correct: (i) find the best system or a near-best sy
stem; (ii) identify a subset of systems that are more than a practically in
significant difference from the best; and (iii) provide a lower confidence
bound on the probability that the best or near-best system will be selected
. All of the procedures assume normally distributed data, but versions allo
w unequal variances and common random numbers.