EXPLOITING DATA STRUCTURE LOCALITY IN THE DATA-FLOW MODEL

Citation
Wm. Miller et al., EXPLOITING DATA STRUCTURE LOCALITY IN THE DATA-FLOW MODEL, Journal of parallel and distributed computing, 27(2), 1995, pp. 183-200
Citations number
47
Categorie Soggetti
Computer Sciences","Computer Science Theory & Methods
ISSN journal
07437315
Volume
27
Issue
2
Year of publication
1995
Pages
183 - 200
Database
ISI
SICI code
0743-7315(1995)27:2<183:EDSLIT>2.0.ZU;2-P
Abstract
Although the dataflow model has been shown to allow the exploitation o f parallelism at all levels, research of the past decade has revealed several fundamental problems. Synchronization at the instruction level , token matching, coloring, and re-labeling operations have a negative impact on performance by significantly increasing the number of non-c ompute ''overhead'' cycles. Recently, many novel hybrid von-Neumann da ta driven machines have been proposed to alleviate some of these probl ems. The major objective has been to reduce or eliminate unnecessary s ynchronization costs through simplified operand matching schemes and i ncreased task granularity. Moreover, the results from recent studies q uantifying locality suggest sufficient spatial and temporal locality i s present in dataflow execution to merit its exploitation. In this pap er we present a data structure for exploiting locality in a data drive n environment: the vector cell. A vector cell consists of a number of fixed length chunks of data elements. Each chunk is tagged with a pres ence bit, providing intra-chunk strictness and inter-chunk non-strictn ess to data structure access. We describe the semantics of the model, processor architecture and instruction set as well as a Sisal to dataf low vectorizing compiler backend. The vector cell model is evaluated b y comparing its performance to those of both a classical fine-grain da taflow processor employing I-structures and a conventional pipelined v ector processor. Results indicate that the model is surprisingly resil ient to long memory and communication latencies and is able to dynamic ally exploit the underlying parallelism across multiple processing ele ments at run time. (C) 1995 Academic Press, Inc.