This paper addresses the effects of design options on the cost and the perf
ormance of CMPs (chip multiprocessors) with a shared L2 cache. The design o
ptions we consider include the instruction-issue rates of the processors an
d the sizes of the internal caches. We focus our study more on implementati
on issues rather than architectural perspectives. We model ail the function
al blocks of the CMPs in hardware description language and estimate their c
ost/performance by using a program-driven simulator developed for this stud
y. Realistic parameters for current technologies are used in the CPU/memory
-system simulation models. Our results show that clustering four CPUs with
single issue, integrating a 4-kbyte L1 cache and a 128-kbyte L2 cache, coul
d be the best choice for the technologies considered.