Sw. Leung et al., Basic gene grammars and DNA-ChartParser for language processing of Escherichia coli promoter DNA sequences, BIOINFORMAT, 17(3), 2001, pp. 226-236
Motivation: The field of 'DNA linguistics' has emerged from pioneering work
in computational linguistics and molecular biology. Most formal grammars i
n this field are expressed using Definite Clause Grammars but these have co
mputational limitations which must be overcome. The present study provides
a new DNA parsing system, comprising a logic grammar formalism called Basic
Gene Grammars and a bidirectional chart parser DNA-ChartParser.
Results: The use of Basic Gene Grammars is demonstrated in representing man
y formulations of the knowledge of Escherichia coli promoters, including kn
owledge acquired from human experts, consensus sequences, statistics (weigh
t matrices), symbolic learning, and neural network learning. The DNA-ChartP
arser provides bidirectional parsing facilities for BGGs in handling overla
pping categories, gap categories, approximate pattern matching, and constra
ints. Basic Gene Grammars and the DNA-ChartParser allowed different sources
of knowledge for recognizing E,coli promoters to be combined to achieve be
tter accuracy as assessed by parsing these DNA sequences in real-world data
sets.
Availability: DNA-ChartParser runs under SICStus Prolog. It and a few examp
les of Basic Gene Grammars are available at the URL: http://www.dai.ed.ac.u
k/-siu/DNA.