CONSENSUS CLUSTERING OF US TEMPERATURE AND PRECIPITATION DATA

Authors
Citation
Rg. Fovell, CONSENSUS CLUSTERING OF US TEMPERATURE AND PRECIPITATION DATA, Journal of climate, 10(6), 1997, pp. 1405-1427
Citations number
29
Categorie Soggetti
Metereology & Atmospheric Sciences
Journal title
ISSN journal
08948755
Volume
10
Issue
6
Year of publication
1997
Pages
1405 - 1427
Database
ISI
SICI code
0894-8755(1997)10:6<1405:CCOUTA>2.0.ZU;2-A
Abstract
A ''consensus clustering'' strategy is applied to long-term temperatur e and precipitation time series data for the purpose of delineating cl imate zones of the conterminous United States in a ''data-driven'' (as opposed to ''rule-driven'') fashion. Cluster analysis simplifies a da taset by arranging ''objects'' (here, climate divisions or stations) i nto a smaller number of relatively homogeneous groups or clusters on t he basis of interobject dissimilarities computed using the identified ''attributes'' (here, temperature and precipitation measurements recor ded for the objects). The results demonstrate the spatial scales assoc iated with climatic variability and may suggest climatically justified ways in which the number of objects in a dataset may be reduced. Impl icit in this work is the arguable contention that temperature and prec ipitation data are both necessary and sufficient for the delineation o f climatic zones. In prior work, the temperature and precipitation dat a were mixed during the computation of the interobject dissimilarities . This allowed the clusters to jointly reflect temperature and precipi tation distinctions, but also had inherent problems relating to arbitr ary attribute scaling and information redundancy that proved difficult to resolve. In the present approach, the temperature and precipitatio n data are clustered separately and then categorically intersected to forge consensus clusters. The consensus outcome may be viewed as havin g identified the temperature subzones of precipitation clusters (or vi ce versa) or as representing distinct groupings that are relatively ho mogeneous with respect to both attribute types simultaneously. The dis similarity measure employed herein is the Euclidean distance. As it em ploys only continuous time series data representing a single informati on type (temperature or precipitation), the consensus approach has the advantage of allowing an attractively simple interpretation of the to tal Euclidean distance between object pairs. The total squared distanc e may be subdivided into three components representing object dissimil arity with respect to temporal mean (level), seasonality (variability) , and coseasonality (relative temporal phasing). Therefore, concerns a bout redundancy or arbitrary scaling problems are neutralized. This is seen as the chief advantage of consensus clustering. The consensus st rategy has several disadvantages. It is possible for two (or more) rel atively general, undetailed clusterings to produce a very complex and fragmented clustering following categorical intersection. Further, the fact that the analyst chooses the clustering levels of the separate, contributing clusterings means that he or she has considerable freedom in fashioning the consensus outcome, which makes it difficult (if not impossible) to argue that true, ''natural'' clusters have been identi fied. The latter often applies to cluster analysis in general, however . It is believed that the consensus approach merits consideration owin g to its advantages. Two consensus outcomes are presented: a lower-ord er solution with 14 clusters and a higher-order solution with 26 clust ers. The sensitivity of these clusterings to perturbations in the inpu t data is assessed. The regionalizations are compared with those prese nted in prior work.