In the sequences released by the Arabidopsis Genome Initiative (AGI), we di
scovered a new and unexpectedly large family of orphan genes (127 genes by
01.08.99), named AtPCMP. The distribution of the AtPCMP genes on the five c
hromosomes suggests that the genome of Arabidopsis thaliana contains more t
han 200 genes of this family (1% of the whole genome). The deduced AtPCMP p
roteins are characterized by a surprising combinatorial organization of seq
uence motifs. The amino-terminal domain is made of a succession of three co
nserved motifs which generate an important diversity. These proteins are cl
assified into three subfamilies based on the length and nature of their car
boxy-terminal domain constituted by 1-6 motifs. All the motifs characterize
d have an important level of conservation in both sequence and spacing. A s
pecific signature of this large family is defined. The presence of ESTs in
databases and the detection of clones in A. thaliana cDNA libraries indicat
e that most of the genes of this family are expressed. The absence of simil
ar sequences outside the plant kingdom strongly suggests that this unusuall
y large orphan family is unique to plants. Features, the genesis, the poten
tial function and the evolution of this plant combinatorial and modular pro
tein family are discussed.