This paper is an attempt to understand the processes by which software ages
. We define code to be aged or decayed if its structure makes it unnecessar
ily difficult to understand or change and we measure the extent of decay by
counting the number of faults in code in a period of time. Using change ma
nagement data from a very large, long-lived software system, we explore the
extent to which measurements from the change history are successful in pre
dicting the distribution over modules of these incidences of faults. In gen
eral, process measures based on the change history are more useful in predi
cting fault rates than product metrics of the code: For instance, the numbe
r of times code has been changed is a better indication of how many faults
it will contain than is its length. We also compare the fault Fates of code
of Various ages, finding that if a module is, on the average, a year older
than an otherwise similar module, the older module will have roughly a thi
rd fewer faults. Our most successful model measures the fault potential of
a module as the sum of contributions from all of the times the module has b
een changed, with large, recent changes receiving the most weight.