Identifying outliers and high-leverage points is a fundamental step in the
least-squares regression model building process. Various influence measures
based on different motivational arguments, and designed to measure the inf
luence of observations on different aspects of various regression results,
are elucidated and critiqued here. On the basis of a statistical analysis o
f the residuals (classical, normalized, standardized, jackknife, predicted
and recursive) and diagonal elements of a projection matrix, diagnostic plo
ts for influential points indication are formed. Regression diagnostics do
not require a knowledge of an alternative hypothesis for testing, or the fu
lfillment of the other assumptions of classical statistical tests. In the i
nteractive, PC-assisted diagnosis of data, models and estimation methods, t
he examination of data quality involves the detection of influential points
, outliers and high-leverages, which cause many problems in regression anal
ysis. This paper provides a basic survey of the influence statistics of sin
gle cases combining exploratory analysis of all variables. The graphical ai
ds to the identification of outliers and high-leverage points are combined
with graphs for the identification of influence type based on the likelihoo
d distance. All these graphically oriented techniques are suitable for the
rapid estimation of influential points, but are generally incapable of solv
ing problems with masking and swamping. The powerful procedure for the comp
utation of influential points characteristics has been written in Matlab 5.
3 and is available from authors. (C) 2001 Elsevier Science B.V. All rights
reserved.