E. Fishbein et Rt. Patterson, ERROR-WEIGHTED MAXIMUM-LIKELIHOOD (EWML) - A NEW STATISTICALLY BASED METHOD TO CLUSTER QUANTITATIVE MICROPALEONTOLOGICAL DATA, Journal of paleontology, 67(3), 1993, pp. 475-486
The advent of readily available computer-based clustering packages has
created some controversy in the micropaleontological community concer
ning the use and interpretation of computer-based biofacies discrimina
tion. This is because dramatically different results can be obtained d
epending on methodology. The analysis of various clustering techniques
reveals that, in most instances, no statistical hypothesis is contain
ed in the clustering model and no basis exists for accepting one biofa
cies partitioning over another. Furthermore, most techniques do not co
nsider standard error in species abundances and generate results that
are not statistically relevant. When many rare species are present, st
atistically insignificant differences in rare species can accumulate a
nd overshadow the significant differences in the major species, leadin
g to biofacies containing members having little in common. A statistic
ally based ''error-weighted maximum likelihood'' (EWML) clustering met
hod is described that determines biofacies by assuming that samples fr
om a common biofacies are normally distributed. Species variability is
weighted to be inversely proportional to measurement uncertainty. The
method has been applied to samples collected from the Fraser River De
lta marsh and shows that five distinct biofacies can be resolved in th
e data. Similar results were obtained from readily available packages
when the data set was preprocessed to reduce the number of degrees of
freedom. Based on the sample results from the new algorithm, and on te
sts using a representative micropaleontological data set, a more conve
ntional iterative processing method is recommended. This method, altho
ugh not statistical in nature, produces similar results to EWML (not c
ommercially available yet) with readily available analysis packages. F
inally, some of the more common clustering techniques are discussed an
d strategies for their proper utilization are recommended.