Normal and lognormal data distribution in geochemistry: death of a myth. Consequences for the statistical treatment of geochemical and environmental data
C. Reimann et P. Filzmoser, Normal and lognormal data distribution in geochemistry: death of a myth. Consequences for the statistical treatment of geochemical and environmental data, ENVIR GEOL, 39(9), 2000, pp. 1001-1014
All variables of several large data sets from regional geochemical and envi
ronmental surveys were tested for a normal or lognormal data distribution.
As a general rule, almost all variable (up to more than 50 analysed chemica
l elements per data set) show neither a normal or a lognormal data distribu
tion. Even when different transformation methods are used more than 70% of
all variables in every single data set do not approach a normal distributio
n. Distributions are usually skewed, have outliers and originate from more
than one process. When dealing with regional geochemical or environmental d
ata normal and/or lognormal distributions are an exception and not the rule
. This observation has serious consequences for the further statistical tre
atment of geochemical and environmental data. The most widely used statisti
cal methods are all based on the assumption that the studied data show a no
rmal or lognormal distribution. Neglecting that geochemical and environment
al data show neither a normal or lognormal distribution will lead to biased
or faulty results when such techniques are used.