Efficient discovery of optimal word-association patterns in large text databases

Citation
S. Shimozono et al., Efficient discovery of optimal word-association patterns in large text databases, NEW GEN COM, 18(1), 2000, pp. 49-60
Citations number
14
Categorie Soggetti
Computer Science & Engineering
Journal title
NEW GENERATION COMPUTING
ISSN journal
02883635 → ACNP
Volume
18
Issue
1
Year of publication
2000
Pages
49 - 60
Database
ISI
SICI code
0288-3635(2000)18:1<49:EDOOWP>2.0.ZU;2-C
Abstract
We study efficient discovery of proximity word-association patterns, define d by a sequence of strings and a proximity gap, from a collection of texts with the positive and the negative labels. We present an algorithm that fin ds all d-strings k-proximity word-association patterns that maximize the nu mber of texts whose matching agree with their labels. It runs in expected t ime complexity O(k(d-1) n log(d) n) and space O(k(d-1)n) with the total len gth n of texts, if texts are uniformly random strings. We also show that th e problem to find one of the best word-association patterns with arbitraril y many strings is MAX SNP-hard.