A TIMING-DRIVEN PARTITIONING SYSTEM FOR MULTIPLE FPGAS

Authors
Citation
K. Roy et C. Sechen, A TIMING-DRIVEN PARTITIONING SYSTEM FOR MULTIPLE FPGAS, VLSI design, 4(4), 1996, pp. 309-328
Citations number
30
Categorie Soggetti
System Science","Engineering, Eletrical & Electronic","Computer Science Hardware & Architecture
Journal title
ISSN journal
1065514X
Volume
4
Issue
4
Year of publication
1996
Pages
309 - 328
Database
ISI
SICI code
1065-514X(1996)4:4<309:ATPSFM>2.0.ZU;2-M
Abstract
Field-programmable systems with multiple FPGAs on a PCB or an MCM are being used by system designers when a single FPGA is not sufficient. W e address the problem of partitioning a large technology mapped FPGA c ircuit onto multiple FPGA devices of a specific target technology. The physical characteristics of the multiple FPGA system (MFS) pose addit ional constraints to the circuit partitioning algorithms: the capacity of each FPGA, the timing constraints, the number of I/Os per FPGA, an d the pre-designed interconnection patterns of each FPGA and the packa ge. Existing partitioning techniques which minimize just the cut sizes of partitions fail to satisfy the above challenges;We therefore prese nt a timing driven N-way partitioning algorithm based on simulated ann ealing for technology-mapped FPGA circuits. The signal path delays are estimated during partitioning using a timing model specific to a mult iple FPGA architecture. The model combines all possible delay factors in a system with multiple FPGA chips of a target technology. Furthermo re, we have incorporated a new dynamic net-weighting scheme to minimiz e the number of pin-outs for each chip. Finally, we have developed a g raph-based global router for pin assignment which can handle the pre-r outed connections of our MFS structure. In order to reduce the time sp ent in the simulated annealing phase of the partitioner, clusters of c ircuit components are identified by a new linear-time bottom-up cluste ring algorithm. The annealing-based N-way partitioner executes four ti mes faster using the clusters as opposed to a flat netlist with improv ed partitioning results. For several industrial circuits, our approach outperforms the recursive min-cut bi-partitioning algorithm by 35% in terms of nets cut. Our approach also outperforms an industrial FPGA p artitioner by 73% on average in terms of unroutable nets. Using the pe rformance optimization capabilities in our approach we have successful ly partitioned the MCNC benchmarks satisfying the critical path constr aints and achieving a significant reduction in the longest path delay. An average reduction of 17% in the longest path delay was achieved at the cost of 5% in total wire length.