Prefetching is a promising approach to tackle the memory latency problem. T
wo basic variants of hardware data prefetching methods are sequential prefe
tching and stride prefetching. The latter based on stride calculation of fu
ture references has the potential to out-perform the former which is based
on the data locality. In this paper, a typical stride prefetching and its i
mproved version, adaptive stride prefetching, are compared in quantitative
way using simulation for some parallel benchmark programs in the context of
uniform memory access and non-uniform memory access architectures. The sim
ulation results show that adaptability of stride is essential since the pro
posed adaptive scheme can reduce pending stall time which is large in the t
ypical scheme.