Objective: To describe interobserver variability among emergency medicine (
EM) faculty when using global assessment (GA) rating scales and performance
-based criterion (PBC) checklists to evaluate EM residents' clinical skills
during standardized patient (SP) encounters. Methods: Six EM residents wer
e videotaped during encounters with SPs and subsequently evaluated by 38 EM
faculty at four EM residency sites. There were two encounters in which a s
ingle SP presented with headache, two in which a second SP presented with c
hest pain, and two in which a third SP presented with abdominal pain, resul
ting in two parallel sets of three. Faculty used GA rating scales to evalua
te history taking, physical examination, and interpersonal skills for the i
nitial set of three cases. Each encounter in the second set was evaluated w
ith complaint-specific PBC checklists developed by SAEM's National Consensu
s Group on Clinical Skills Task Force. Results: Standard deviations, comput
ed for each score distribution, were generally similar across evaluation me
thods. None of the distributions deviated significantly from that of a Gaus
sian distribution, as indicated by the Kolmogorov-Smirnov goodness-of-fit t
est. On PBC checklists, 80% agreement among faculty observers was found for
74% of chest pain, 45% of headache, and 30% of abdominal pain items. Concl
usions: When EM faculty evaluate clinical performance of EM residents durin
g videotaped SP encounters, interobserver variabilities are similar, whethe
r a PBC checklist or a GA rating scale is used.