The era of complete genome sequences has arrived and with it vast amou
nts of data which must be annotated, cross referenced, and placed with
in the regulatory networks which define the physiology of an organism.
One eucaryotic and three procaryotic genomes have been completed and
the data made available and another 50 sequences are expected to be co
mpleted by the end of the decade. One of the first steps in the new po
st genome era will be to decipher the functions of the huge numbers of
new open reading frames. Various approaches to investigate what unkno
wn genes do and how genes interact together within an organism are bei
ng undertaken, including (1) the simultaneous measurement of the expre
ssion levels of all genes in a cell and (2) the mapping and quantitati
on of all proteins expressed within a cell. The idea of systematically
mapping and identifying the total protein complement of the genome (t
he ''proteome'') arose over 20 years ago when the separation of protei
ns from total cell extracts by two dimensional (2D) gel electrophoresi
s was developed. This review will focus on the use of 2D gel electroph
oresis as the basis for constructing proteome maps and on the rapid ad
vances in mass spectrometry which will allow the large-scale, automate
d identification of proteins which is necessary for the creation of su
ch databases. (C) 1997 Academic Press.