Genes differentially expressed in different tissues, during developmen
t, or during specific pathologies are of foremost interest to both bas
ic and pharmaceutical research. ''Transcript profiles'' or ''digital N
ortherns'' are generated routinely by partially sequencing thousands o
f randomly selected clones from relevant cDNA libraries. Differentiall
y expressed genes can then be detected From variations in the counts o
f their cognate sequence tags. Here we present the First systematic st
udy on the influence of random fluctuations and sampling size on the r
eliability of this kind of data. We establish a rigourous significance
test and demonstrate its use on publicly available transcript profile
s. The theory links the threshold of selection of putatively regulated
genes (e.g., the number of pharmaceutical leads) to the fraction of f
alse positive clones one is willing to risk. Our results delineate mor
e precisely and extend the limits within which digital Northern data c
an be used.