Based on the observation that a single mutational event can delete or
insert multiple residues, affine gap costs for sequence alignment char
ge a penalty for the existence of a gap, and a further length-dependen
t penalty. From structural or multiple alignments of distantly related
proteins, it has been observed that conserved residues frequently fal
l into ungapped blocks separated by relatively nonconserved regions. T
o take advantage of this structure, a simple generalization of affine
gap costs is proposed that allows nonconserved regions to be effective
ly ignored. The distribution of scores from local alignments using the
se generalized gap costs is shown empirically to follow an extreme val
ue distribution. Examples are presented for which generalized affine g
ap costs yield superior alignments from the standpoints both of statis
tical significance and of alignment accuracy. Guidelines for selecting
generalized affine gap costs are discussed, as is their possible appl
ication to multiple alignment. Proteins 32:88-96, 1998. (C) 1998 Wiley
-Liss, Inc.dagger.