Scalability in formal concept analysis

Citation
R. Cole et Pw. Eklund, Scalability in formal concept analysis, COMPUT INTE, 15(1), 1999, pp. 11-27
Citations number
21
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
COMPUTATIONAL INTELLIGENCE
ISSN journal
08247935 → ACNP
Volume
15
Issue
1
Year of publication
1999
Pages
11 - 27
Database
ISI
SICI code
0824-7935(199902)15:1<11:SIFCA>2.0.ZU;2-5
Abstract
Formal Concept Analysis is a symbolic learning technique derived from mathe matical algebra and order theory. The technique has been applied to a broad range of knowledge representation and exploration tasks in a number of dom ains. Most recorded applications of Formal Concept Analysis deal with a sma ll number of objects and attributes, in which case the complexity of the al gorithms used for indexing and retrieving data is not a significant issue. However, when Formal Concept Analysis is applied to exploration of a large numbers of objects and attributes, the size of the data makes issues of com plexity and scalability crucial. This paper presents the results of experiments carried out with a set of 4, 000 medical discharge summaries in which were recognized 1,962 attributes f rom the Unified Medical Language System (UMLS). In this domain, the objects are medical documents (4,000) and the attributes are UMLS terms extracted from the documents (1,962). When Formal Concept Analysis is used to iterati vely analyze and visualize these data, complexity and scalability become cr itically important. Although the amount of data used in this experiment is small compared with the size of primary memory in modern computers, the results are still impor tant because the probability distributions that determine the efficiencies are likely to remain stable as the size of the data is increased. Our work presents two outcomes. First, we present a methodology for explori ng knowledge in text documents using Formal Concept Analysis by employing c onceptual scales created as the result of direct manipulation of a line dia gram. The conceptual scales lead to small derived purified contexts that ar e represented using nested line diagrams. Second, we present an algorithm f or the fast determination of purified contexts from a compressed representa tion of the large formal context. Our work draws on existing encoding and c ompression techniques to show how rudimentary data analysis can lead to sub stantial efficiency improvements in knowledge visualization.