The evolution of digital libraries and the Internet has dramatically transf
ormed the processing, storage, and retrieval of information. Efforts to dig
itize text, images, video, and audio now consume a substantial portion of b
oth academic and industrial activity. Even when there is no shortage of tex
tual materials on a particular topic, procedures for indexing or extracting
the knowledge or conceptual information contained in them can be lacking.
Recently developed information retrieval technologies are based on the conc
ept of a vector space. Data are modeled as a matrix, and a user's query of
the database is represented as a vector. Relevant documents in the database
are then identified via simple vector operations. Orthogonal factorization
s of the matrix provide mechanisms for handling uncertainty in the database
itself. The purpose of this paper is to show how such fundamental mathemat
ical concepts from linear algebra can be used to manage and index large tex
t collections.