In a recent paper we have estimated the total number of protein coding open
reading frames (ORFs) in the Saccharomyces cerevisiae genome, based on the
ir properties, at about 4800, This number is much smaller than the 5800-600
0 which is widely accepted. In this paper we analyse differences between th
e set of ORFs with known phenotypes annotated in the Munich Information Cen
tre for Protein Sequences (MIPS) database and ORFs for which the probabilit
y of coding, counted by us, is very low. We have found that many of the lat
ter ORFs have properties of antisense sequences of coding ORFs, which sugge
sts that they could have been generated by duplication of coding sequences.
Since coding sequences generate ORFs inside themselves, with especially hi
gh frequency in the antisense sequences, we have looked for homology betwee
n known proteins and hypothetical polypeptides generated by ORFs under cons
ideration in all the six phases. For many ORFs we have found paralogues and
orthologues in phases different than the phase which had been assumed in t
he MIPS database as coding.