ITA
ENG

Efficient discovery of optimal word-association patterns in large text databases

Authors

Shimozono, S Arimura, H Arikawa, S

Citation

S. Shimozono et al., Efficient discovery of optimal word-association patterns in large text databases, NEW GEN COM, 18(1), 2000, pp. 49-60

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

NEW GENERATION COMPUTING

ISSN journal

02883635 → ACNP

Volume

Issue

Year of publication

2000

Pages

49 - 60

Database

ISI

SICI code

0288-3635(2000)18:1<49:EDOOWP>2.0.ZU;2-C

Abstract

We study efficient discovery of proximity word-association patterns, define d by a sequence of strings and a proximity gap, from a collection of texts with the positive and the negative labels. We present an algorithm that fin ds all d-strings k-proximity word-association patterns that maximize the nu mber of texts whose matching agree with their labels. It runs in expected t ime complexity O(k(d-1) n log(d) n) and space O(k(d-1)n) with the total len gth n of texts, if texts are uniformly random strings. We also show that th e problem to find one of the best word-association patterns with arbitraril y many strings is MAX SNP-hard.