In this paper, we study the memory system design for superscalar proce
ssing. Benchmarking is used to examine the execution behavior of load/
store instructions, such as load/store parallelism and memory load/sto
re port utilization. It is found that the use of only a single load/st
ore port forms a system bottleneck. A superscalar processor benefits f
rom multiple load/store ports and system performance saturates with tw
o load/store ports. The memory system must be carefully designed if mu
ltiple load/store ports are supported in a superscalar processor. Thus
, we consider the design of the data cache subsystem. The data cache c
onfigurations we investigate include multiported cache, multibank cach
e, and duplicated cache. Through benchmarking, we find that the duplic
ated cache performs well in most benchmarks. Yet the cost of a duplica
ted cache is higher. In a superscalar multiprocessing environment, in
order to properly maintain memory consistency, we must consider the lo
ad/store ordering of the processors. In superscalar processors, the lo
ad/store ordering may be in one of three forms: total ordering, load b
ypassing, and load forwarding. In this research, we conclude that to s
upport the sequential consistency model, the load/store instructions m
ust be totally ordered. Load bypassing and load forwarding are suffici
ent to support the processor consistency model.