COMPARISON OF STRINGS BELONGING TO THE SAME FAMILY

Authors
Citation
M. Elloumi, COMPARISON OF STRINGS BELONGING TO THE SAME FAMILY, Information sciences, 111(1-4), 1998, pp. 49-63
Citations number
37
Categorie Soggetti
Computer Science Information Systems","Computer Science Information Systems
Journal title
ISSN journal
00200255
Volume
111
Issue
1-4
Year of publication
1998
Pages
49 - 63
Database
ISI
SICI code
0020-0255(1998)111:1-4<49:COSBTT>2.0.ZU;2-6
Abstract
The comparison of strings belonging to the same family can be made thr ough the construction of a long(est) Common Subsequence (CS) that refl ects structural similarities that exist between these strings. The lon gest CS problem is NP-complete. In this paper, we present an O(N-2 x L (2)log(2)(L)) algorithm, where N is the number of the strings and L is the maximum length, that constructs a CS, to a family of strings, mad e up by longer words appearing, approximately, in the same positions i n all the strings. During each iteration, our algorithm looks for the longest common words that appear, approximately, in the same positions in all the strings, then, filters the common words found, to keep jus t those that assure the smallest norm between all the strings. The fil tering is based on a new distance called contextual distance. (C) 1998 Elsevier Science Inc. All rights reserved.