Codon usage and base composition in sequences from the A + T-rich geno
me of Rickettsia prowazekii, a member of the alpha Proteobacteria, hav
e been investigated. Synonymous codon usage patterns are roughly simil
ar among genes, even though the data set includes genes expected to be
expressed at very different levels, indicating that translational sel
ection has been ineffective in this species. However, multivariate sta
tistical analysis differentiates genes according to their G + C conten
ts at the first two codon positions. To study this variation, we have
compared the amino acid composition patterns of 21 R. prowazekii prote
ins with that of a homologous set of proteins from Escherichia coil. T
he analysis shows that individual genes have been affected by biased m
utation rates to very different extents: genes encoding proteins highl
y conserved among other species being the least affected. Overall, pro
tein coding and intergenic spacer regions have G + C content values of
32.5% and 21.4%, respectively. Extrapolation from these values sugges
ts that P. prowazekii has around 800 genes and that 60-70% of the geno
me may be coding.