S. Arikawa et al., A MACHINE DISCOVERY FROM AMINO-ACID-SEQUENCES BY DECISION TREES OVER REGULAR PATTERNS, New generation computing, 11(3-4), 1993, pp. 361-375
This paper describes a machine learning system that discovered a ''neg
ative motif'', in transmembrane domain identification from amino acid
sequences, and reports its experiments on protein data using PIR datab
ase. We introduce a decision tree whose nodes are labeled with regular
patterns. As a hypothesis, the system produces such a decision tree f
or a small number of randomly chosen positive and negative examples fr
om PIR. Experiments show that our system finds reasonable hypotheses v
ery successfully. As a theoretical foundation, we show that the class
of languages defined by decesion trees of depth at most d over k-varia
ble regular patterns is polynomial-time learnable in the sense of prob
ably approximately correct (PAC) learning for any fixed d, k greater-t
han-or-equal-to 0.