With effective packet-scheduling mechanisms, modern integrated networks can
support the diverse quality-of-service requirements of emerging applicatio
ns. However, arbitrating between a large number of small packets on a high-
speed link requires an efficient hardware implementation of a priority queu
e. To highlight the challenges of building scalable priority queue architec
tures. this paper includes a detailed comparison of four existing approache
s: a binary tree of comparators, priority encoder with multiple first-in-fi
rst-out lists, shift register, and systolic array. Based on these compariso
n results, we propose two new architectures that scale to the large number
of packets (N) and large number of priority levels (P) necessary in modern
switch designs. The first architecture combines the faster clock speed of a
systolic array with the lower memory requirements of a shift register, res
ulting in a hybrid design; a tunable parameter allows switch designers to c
arefully balance the trade-off between bus loading and chip area. We then e
xtend this architecture to serve multiple output ports in a shared-memory s
witch. This significantly decreases complexity over the traditional approac
h of dedicating a separate priority queue to each outgoing link. Using the
Verilog hardware description language and the Epoch silicon compiler, we ha
ve designed and simulated these two new architectures. as well as the four
existing approaches. The simulation experiments compare the designs across
a range of priority queue sizes and performance metrics, including enqueue/
dequeue speed, chip area, and number of transistors.