Many organizations are implementing free-text indexing schemes in order to
build software catalogs with the aim of promoting systematic code reuse. Un
fortunately, comments embedded in software systems suffer from several shor
tcomings, therefore it is not reasonable to pretend that the quality of the
indices that can be extracted from them must be high. in the present empir
ical work, we implemented one such methods with the purpose of showing what
could be expected when they are applied to the comments. The method we ref
erred to uses pairs of words (called lexical affinities) as indexing units.
The authors of such a method have given numerical indications (by carrying
out a limited number of experiments on text-files about Unix commands) tha
t lexical affinities provide better results than single-word schemes tradit
ionally adopted in information retrieval. Our findings, arrived at by exper
imenting with such an indexing scheme over the comments of a large collecti
on of commercial routines, account for our pessimism: only in 1.9% of the t
exts processed, the extracted indices are semantically representative of th
e purpose of the routines the comments were embedded in. A general strategy
suitable to get better results is proposed in the second part of the artic
le and evaluated against the same collection of routines. (C) 1999 Elsevier
Science B.V. All rights reserved.