Formal Concept Analysis is a symbolic learning technique derived from mathe
matical algebra and order theory. The technique has been applied to a broad
range of knowledge representation and exploration tasks in a number of dom
ains. Most recorded applications of Formal Concept Analysis deal with a sma
ll number of objects and attributes, in which case the complexity of the al
gorithms used for indexing and retrieving data is not a significant issue.
However, when Formal Concept Analysis is applied to exploration of a large
numbers of objects and attributes, the size of the data makes issues of com
plexity and scalability crucial.
This paper presents the results of experiments carried out with a set of 4,
000 medical discharge summaries in which were recognized 1,962 attributes f
rom the Unified Medical Language System (UMLS). In this domain, the objects
are medical documents (4,000) and the attributes are UMLS terms extracted
from the documents (1,962). When Formal Concept Analysis is used to iterati
vely analyze and visualize these data, complexity and scalability become cr
itically important.
Although the amount of data used in this experiment is small compared with
the size of primary memory in modern computers, the results are still impor
tant because the probability distributions that determine the efficiencies
are likely to remain stable as the size of the data is increased.
Our work presents two outcomes. First, we present a methodology for explori
ng knowledge in text documents using Formal Concept Analysis by employing c
onceptual scales created as the result of direct manipulation of a line dia
gram. The conceptual scales lead to small derived purified contexts that ar
e represented using nested line diagrams. Second, we present an algorithm f
or the fast determination of purified contexts from a compressed representa
tion of the large formal context. Our work draws on existing encoding and c
ompression techniques to show how rudimentary data analysis can lead to sub
stantial efficiency improvements in knowledge visualization.