This paper proposes a superscalar architecture SARCH. An efficient mec
hanism for exploiting instruction-level parallelism (ILP) is required
to achieve high performance on nonnumerical applications. SARCH employ
s boosting for speculative execution to exploit large ILP, and a branc
h scheme to reduce branch penalty. Although these mechanisms are reali
zed by simple hardware, code motion beyond basic blocks by compilers i
s necessary. A code scheduler is developed which performs global sched
uling for the boosting and the delayed branch. The performance evaluat
ion shows that the scheduled code achieves 1.52 time performance impro
vement over the original code, and 1.75x speed-up over the scalar proc
essor.