A. Elamawy, CLOCKING ARBITRARILY LARGE COMPUTING STRUCTURES UNDER CONSTANT SKEW BOUND, IEEE transactions on parallel and distributed systems, 4(3), 1993, pp. 241-255
We describe a new scheme for global synchronization of arbitrarily lar
ge computing structures such that clock skew between any two communica
ting cells is bounded above by a constant. The new clocking scheme doe
s not rely on distribution trees, phase-locked loops or handshake prot
ocols. Instead it utilizes clock nodes which perform simple processing
on clock signals to maintain a constant skew bound irrespective of th
e size of the computing structure. Among the salient features of the n
ew scheme is the interdependence between network topology, skew upper
bound, and maximum clocking rate achievable. We use a 2-D mesh framewo
rk to present the concepts, to introduce three network designs and to
prove some basic results. For,each network we establish the (constant)
upper bound on clock skew between any two communicating processors, a
nd show its independence of network size. Let F be the fan-in/fan-out
of a clock node and DELTA be the maximum (node + link) delay. It will
be shown that any of the three networks complies with the following: a
) Maximum skew between communicating cells = (5 - F)DELTA; 2 less-than
-or-equal-to F less-than-or-equal-to 4 b) Maximum skew between any two
node inputs = (6 - F)DELTA; 2 less-than-or-equal-to F less-than-or-eq
ual-to 4. The second result is important in setting up timing constrai
nts on clock signals for each respective network. The constraints are
simple and easy to implement. Besides theoretical proofs, simulations
have been carried out to verify correctness and to check the workabili
ty of the scheme. Also a 4 x 4 network has been built and successfully
tested for stability. Other issues such as node design, clocking of n
onplanar structures such as hypercubes, and the new concept of fuse pr
ogrammed clock networks are addressed. A discussion on practical imple
mentation issues is also given. The discussion shows that hardware ove
rhead incurred by the proposed scheme is comparable to that associated
with current asynchronous control schemes.