ARCHITECTURE AND C-PROGRAMMING ENVIRONMENT OF A HIGHLY PARALLEL IMAGESIGNAL PROCESSOR(+)

Citation
J. Kneip et al., ARCHITECTURE AND C-PROGRAMMING ENVIRONMENT OF A HIGHLY PARALLEL IMAGESIGNAL PROCESSOR(+), Microprocessing and microprogramming, 41(5-6), 1995, pp. 391-408
Citations number
28
Categorie Soggetti
Computer Sciences","Computer Science Hardware & Architecture
ISSN journal
01656074
Volume
41
Issue
5-6
Year of publication
1995
Pages
391 - 408
Database
ISI
SICI code
0165-6074(1995)41:5-6<391:AACEOA>2.0.ZU;2-5
Abstract
A highly parallel single-chip image signal processor architecture has been derived by analysis of image processing algorithms. Available lev els of parallelism and their associated demands on data access, contro l and complexity of operations were taken into account. The RISC-archi tecture, called ''HiPAR-DSP'', consists of a control unit, 16 parallel ASIMD-controlled datapaths with autonomous addressing and instruction selection capability, a local data cache per data path, a shared memo ry with matrix type data access and a powerful DMA-unit. The proposed architecture was designed by assessing the results of an analysis of c haracteristic algorithm properties with respect to their inherent para llelization resources, achievable speed up and implementation costs. T his resulted in a proper balance between the degree of parallelism and flexibility, leading to a high performance for a wide field of applic ations. Additional measures were taken to support an efficient high le vel programmability of the processor. This was achieved by the concurr ent implementation of special architectural features and a C++-program ming environment. It consists of an adaptation of the GNU C++-compiler and an optimizing assembler, supporting all levels of concurrence off ered by the hardware. While most levels of parallelization are kept in visible to the programmer, data-level parallelism is expressed by the programmer using special new data types added to the standard C/C++-da ta-types. A sustained performance of about 2.0 Gigaoperations per seco nd is achieved by the 100 MHz clocked processor for numerous image pro cessing algorithms, leading to a processing time e.g. for a normalized correlation of a 512 x 512 image with a 32 x 32 correlation mask of 4 50 ms. Thus, a performance is achieved with a programmable parallel pr ocessor architecture that hitherto required the application of a dedic ated integrated circuit.