Sequences, datalog, and transducers

Citation
A. Bonner et G. Mecca, Sequences, datalog, and transducers, J COMPUT SY, 57(3), 1998, pp. 234-259
Citations number
36
Categorie Soggetti
Computer Science & Engineering
Journal title
JOURNAL OF COMPUTER AND SYSTEM SCIENCES
ISSN journal
00220000 → ACNP
Volume
57
Issue
3
Year of publication
1998
Pages
234 - 259
Database
ISI
SICI code
0022-0000(199812)57:3<234:SDAT>2.0.ZU;2-7
Abstract
This paper develops a query language for sequence databases, such as genome databases and text databases. The language, caned Sequence Datalog, extend s classical Datalog with interpreted function symbols for manipulating sequ ences; it has both a clear operational and declarative semantics, based on a new notion called the extended active domain of a database. The extended domain contains all the sequences in the database and all their subsequence s. This idea leads to a clear distinction between safe and unsafe recursion over sequences: safe recursion stays inside the extended active domain, wh ile unsafe recursion does not. By carefully limiting the amount of unsafe r ecursion, the paper develops a safe and expressive subset of Sequence Datal og. As part of the development, a new type of transducer is introduced, cal led a generalized sequence transducer, Unsafe recursion is allowed only wit hin these generalized transducers. Generalized transducers extend ordinary transducers by allowing them to invoke other transducers as "subroutines." Generalized transducers can be implemented in Sequence Datalog in a straigh tforward way. Moreover, their introduction into the language leads to simpl e conditions that guarantee safety and finiteness. This paper develops two such conditions. The first condition expresses exactly the class of PTIME s equence functions, and the second expresses exactly the class of elementary sequence functions. (C) 1998 Academic Press.