THE DIFFICULTY OF IDENTIFYING GENES IN ANONYMOUS VERTEBRATE SEQUENCES

Citation
Jm. Claverie et al., THE DIFFICULTY OF IDENTIFYING GENES IN ANONYMOUS VERTEBRATE SEQUENCES, Computers & chemistry, 21(4), 1997, pp. 203-214
Citations number
60
Journal title
ISSN journal
00978485
Volume
21
Issue
4
Year of publication
1997
Pages
203 - 214
Database
ISI
SICI code
0097-8485(1997)21:4<203:TDOIGI>2.0.ZU;2-L
Abstract
The identification of genes in newly determined vertebrate genomic seq uences can range from a trivial to an impossible task. In a statistica l preamble, we show how ''insignificant'' are the individual features on which gene identification can be rigorously based: promoter signals , splice sites, open reading frames, etc. The practical identification of genes is thus ultimately a tributary of their resemblance to those already present in sequence databases, or incorporated into training sets. The inherent conservatism of the currently popular methods (data base similarity search, GRAIL) will greatly limit our capacity for mak ing unexpected biological discoveries from increasingly abundant genom ic data. Beyond a very limited subset of trivial cases, the automated interpretation (i.e. without experimental validation) of genomic data, is still a myth. On the other hand, characterizing the 60 000 to 100 000 genes thought to be hidden in the human genome by the mean of indi vidual experiments is not feasible. Thus, it appears that our only hop e of turning genome data into genome information must rely on drastic progresses in the way we identify and analyse genes in silico. (C) 199 7 Elsevier Science Ltd.