ITA
ENG

EXPLOITING DATA STRUCTURE LOCALITY IN THE DATA-FLOW MODEL

Authors

MILLER WM NAJJAR WA BOHM APW

Citation

Wm. Miller et al., EXPLOITING DATA STRUCTURE LOCALITY IN THE DATA-FLOW MODEL, Journal of parallel and distributed computing, 27(2), 1995, pp. 183-200

Citations number

Categorie Soggetti

Computer Sciences","Computer Science Theory & Methods

Journal title

Journal of parallel and distributed computing → ACNP

ISSN journal

07437315

Volume

Issue

Year of publication

1995

Pages

183 - 200

Database

ISI

SICI code

0743-7315(1995)27:2<183:EDSLIT>2.0.ZU;2-P

Abstract

Although the dataflow model has been shown to allow the exploitation o f parallelism at all levels, research of the past decade has revealed several fundamental problems. Synchronization at the instruction level , token matching, coloring, and re-labeling operations have a negative impact on performance by significantly increasing the number of non-c ompute ''overhead'' cycles. Recently, many novel hybrid von-Neumann da ta driven machines have been proposed to alleviate some of these probl ems. The major objective has been to reduce or eliminate unnecessary s ynchronization costs through simplified operand matching schemes and i ncreased task granularity. Moreover, the results from recent studies q uantifying locality suggest sufficient spatial and temporal locality i s present in dataflow execution to merit its exploitation. In this pap er we present a data structure for exploiting locality in a data drive n environment: the vector cell. A vector cell consists of a number of fixed length chunks of data elements. Each chunk is tagged with a pres ence bit, providing intra-chunk strictness and inter-chunk non-strictn ess to data structure access. We describe the semantics of the model, processor architecture and instruction set as well as a Sisal to dataf low vectorizing compiler backend. The vector cell model is evaluated b y comparing its performance to those of both a classical fine-grain da taflow processor employing I-structures and a conventional pipelined v ector processor. Results indicate that the model is surprisingly resil ient to long memory and communication latencies and is able to dynamic ally exploit the underlying parallelism across multiple processing ele ments at run time. (C) 1995 Academic Press, Inc.