Motivation: High-density microarray technology permits the quantitative and
simultaneous monitoring of thousands of genes. The interpretation challeng
e is to extract relevant information from this large amount of data. A grow
ing variety of statistical analysis approaches are available to identify cl
usters of genes that share common expression characteristics, but provide n
o information regarding the biological similarities of genes within cluster
s. The published literature provides a potential source of information to a
ssist in interpretation of clustering results.
Results: We describe a data mining method that uses indexing terms ('keywor
ds') from the published literature linked to specific genes to present a vi
ew of the conceptual similarity of genes within a cluster or group of inter
est. The method takes advantage of the hierarchical nature of Medical Subje
ct Headings used to index citations in the MEDLINE database, and the regist
ry numbers applied to enzymes.