SimpleFit: A framework for analyzing design trade-offs in raw architectures

Citation
Ca. Moritz et al., SimpleFit: A framework for analyzing design trade-offs in raw architectures, IEEE PARALL, 12(7), 2001, pp. 730-742
Citations number
27
Categorie Soggetti
Computer Science & Engineering
Journal title
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
ISSN journal
10459219 → ACNP
Volume
12
Issue
7
Year of publication
2001
Pages
730 - 742
Database
ISI
SICI code
1045-9219(200107)12:7<730:SAFFAD>2.0.ZU;2-5
Abstract
The semiconductor industry roadmap projects that advances in VLSI technolog y will permit more than one billion transistors on a chip by the year 2010. The MIT Raw microprocessor is a proposed architecture that strives to expl oit these chip-level resources by implementing thousands of tiles, each com prising a processing element and a small amount of memory, coupled by a sta tic two-dimensional interconnect. A compiler partitions fine-grain instruct ion-level parallelism across the tiles and statically schedules infertile c ommunication over the interconnect. Because Raw microprocessors fully expos e their internal hardware structure to the software, they can be viewed as a gigantic FPGA with coarse-grained tiles in which software orchestrates co mmunication over static interconnections. One open challenge in Raw archite ctures is to determine their optimal grain size and balance. The grain size is the area of each tile and the balance is the proportion of area in each tile devoted to memory, processing, communication, and off-chip global I/O . if the total chip area is fixed, higher processing power per tile require s large tiles and hence reduces the total number of tiles on the chip. This paper presents SimpleFit, a novel analytical framework that designers can use to reason about the design space of Raw microprocessors. Our model is a lso generalizable to multiprocessors on a chip. Based on an architectural m odel, an application model, and a VLSI cost analysis, the framework compute s the performance of applications and uses an optimization process to ident ify designs that will execute these applications most cost-effectively. Alt hough the optimal machine configurations obtained vary for different applic ations, problem sizes, and budgets, the general trends for various applicat ions are similar. Accordingly, for the applications studied, assuming a onr billion logic transistor equivalent area, we recommend building a Raw chip with approximately 1,000 tiles. 30 words/cycle global I/O, 20 Kbytes of lo cal memory per tile, three to four words/cycle local communication bandwidt h, and single-issue processors. This configuration will give performance ne ar the global optimum for most applications.