This paper develops a query language for sequence databases, such as genome
databases and text databases. The language, caned Sequence Datalog, extend
s classical Datalog with interpreted function symbols for manipulating sequ
ences; it has both a clear operational and declarative semantics, based on
a new notion called the extended active domain of a database. The extended
domain contains all the sequences in the database and all their subsequence
s. This idea leads to a clear distinction between safe and unsafe recursion
over sequences: safe recursion stays inside the extended active domain, wh
ile unsafe recursion does not. By carefully limiting the amount of unsafe r
ecursion, the paper develops a safe and expressive subset of Sequence Datal
og. As part of the development, a new type of transducer is introduced, cal
led a generalized sequence transducer, Unsafe recursion is allowed only wit
hin these generalized transducers. Generalized transducers extend ordinary
transducers by allowing them to invoke other transducers as "subroutines."
Generalized transducers can be implemented in Sequence Datalog in a straigh
tforward way. Moreover, their introduction into the language leads to simpl
e conditions that guarantee safety and finiteness. This paper develops two
such conditions. The first condition expresses exactly the class of PTIME s
equence functions, and the second expresses exactly the class of elementary
sequence functions. (C) 1998 Academic Press.