P. Berger et al., PERFORMANCE OF LEVEL-3 BLAS KERNELS IN A DYNAMICALLY PARTITIONED DATA-FLOW ENVIRONMENT, Computing systems in engineering, 6(4-5), 1995, pp. 357-361
The Dynamically Partitioned Data-Flow (DPDF) model is based on an orig
inal analysis concept of the data dependency graph at the instruction
level. Instead of a breadth first analysis, as in a classical Data-Flo
w Model, we execute instructions along data-dependent paths. As a cons
equence, data locality can be exploited by reusing results between the
execution of consecutive instructions. In addition, the different pat
hs are not statically defined but arise from a dynamical partitioning
of the graph. This model presents the advantage to support very small
cost dynamic scheduling and multitasking strategies. In order to study
the efficiency of this new model, a first architecture has been defin
ed. This architecture is currently limited to a single processor with
one serial processing unit but four graph analyzing units (called pref
etch units). Each of these prefetch units is able to build dynamically
its own execution path inside the Data-Flow graph of an application.
The efficiency of this architecture is studied on a numerical benchmar
k composed of a subset of the Livermore loops and of three routines of
the Level 3 BLAS (GEMM, SYRK and TRSM). Our goal in these experimenta
tions is to demonstrate the ability of the four prefetch units to feed
the ALU.