Molecular phylogeny of the species Escherichia coli using the E. coli refer
ence (ECOR) collection strains has been hampered by (1) the absence of root
ing in the commonly used phenogram obtained from multilocus enzyme electrop
horesis (MLEE) data and (2) the existence of recombination events between s
trains that scramble phylogenetic trees reconstructed from the nucleotide s
equences of genes. We attempted to determine the phylogeny for E. coli base
d on the ECOR strain data by extracting from GenBank the nucleotide sequenc
es of 11 chromosomal structural and 2 plasmid genes for which the Salmonell
a enterica homologous gene sequences were available. For each of the 13 DNA
data sets studied, incongruence with a nonnucleotide whole-genome data set
including MLEE, random amplified polymorphic DNA, and rrn restriction frag
ment length polymorphism data was measured using the incongruence length di
fference (ILD) test of Farris et al. As previously reported, the incongruen
ce observed between the gnd and plasmid gene data and the whole-genome data
was multiple, indicating numerous horizontal transfer and/or recombination
events. In five cases, the incongruence detected by the ILD test was punct
ual, and the donor group was identified. Congruence was not rejected for th
e remaining data sets. The strains responsible for incongruences with the w
hole-genome data set were removed, leading to a "prior-agreement" approach,
i.e., the determination of a phylogeny for E. coli based on several genes,
excluding (1) the genes with multiple incongruences with the whole genome
data, (2) the strains responsible for punctual incongruences, and (3) the g
enes incongruent with each other The obtained phylogeny shows that the most
basal group of E. coli strains is the B2 group rather than the A group, as
generally thought. The D group then emerges as the sister group of the res
t. Finally, the A and B1 groups are sister groups. Interestingly, the most
primitive taxon within E. coli in terms of branching pattern, i.e., the B2
group, includes highly virulent extraintestinal strains with derived charac
ters (extraintestinal virulence determinants) occurring on its own branch.