Background: The recent flood of data from genome sequences and functional g
enomics has given rise to now field, bioinformatics, which combines element
s of biology and computer science.
Objectives: Here we propose a definition for this new field and review some
of the research that is being pursued, particularly in relation to transcr
iptional regulatory systems.
Methods. Our definition is as follows: Bioinformatics; is conceptualizing b
iology in terms of macromolecules (in the sense of physical-chemistry) and
then applying "informatics" techniques (derived from disciplines such as ap
plied moths, computer science, and statistics) to understand and organize t
he information associated with these molecules, on a large-scale.
Results and Conclusions: Analyses in bioinformatics predominantly focus on
three types of large datasets available in molecular biology: macromolecula
r structures, genome sequences, and the results of functional genomics expe
riments (eg expression data). Additional information includes the text of s
cientific papers and "relationship data" from metabolic pathways, taxonomy
trees, and protein-protein interaction networks. Bioinformatics employs a w
ide range of computational techniques including sequence and structural ali
gnment, database design and data mining, macromolecular geometry, phylogene
tic tree construction, prediction of protein structure and function, gene f
inding, and expression data clustering. The emphasis is on approaches integ
rating a variety of computational methods and heterogeneous data sources. F
inally, bioinformatics is a practical discipline. We survey some representa
tive applications, such as finding homologues, designing drugs, and perform
ing large-scale censuses. Additional information pertinent to the review is
available over the web at http://bioinfo.mbb.yale.edu/what-is-it.