A standardized system for scoring the completeness of software designs
produced in experimental settings is proposed. The system produces a
complete and multifaceted expression of a software design, making it i
deal for comparing designs generated in different languages, paradigms
, and methodologies. The system decomposes a design into a large numbe
r of atomic design ''features'' and thus is able to characterize the d
ifferent strengths (and weaknesses) that each design possesses, and to
do so in a way that is ''paradigm neutral'', that is, not unfairly bi
ased towards one language, paradigm, or methodology. As a result of th
e thoroughness of this scoring system, an absolute completeness score
for a design may be computed, facilitating the comparisons of designs
across studies, across design problems, and across experimental condit
ions. The scoring system allows for the representation of design alter
natives and optional features, recognizing that software design proble
ms are neither uniquely understood nor sufficiently constrained to ide
ntify a unique solution. In addition, the scoring system characterizes
each component of a design as being specified to a certain level of d
etail or ''refinement''. Techniques for scoring designs and generating
dependent measures are proposed. Alternative notions of design qualit
y and correctness are described, and it is shown how they can be incor
porated into the scoring system. Using the scoring system methodology
as the basis for creating a software design problem typology is discus
sed.