Utilization of fine-grained parallelism in dataflow processor

Dikarev, N.I.; Shabanov, B.M.; Shmelev, A.S.

Home Authors Papers Year of conference Themes Organizations To MES conference

Utilization of fine-grained parallelism in dataflow processor

Authors

Dikarev N.I.

Shabanov B.M.

Shmelev A.S.

Date of publication

2016

Abstract

Dataflow processor can offer the highest performance among scalar microprocessors due to natural parallelism of a dataflow graph as a program. Program in a dataflow computer is a directed graph where nodes are instructions and arches are data dependencies among the nodes. The data conveyed from one instruction to another in packets called tokens. Each token contains the data field (operand) and the context field. The context field consists of the destination instruction number in graph, its index, iteration and subroutine numbers, thus different iterations of nested loops and different subroutine calls of the same graph can be executed concurrently. Two-input instructions issued to execution as soon as both tokens with the same context are arrived. After calculating the results in the functional unit new tokens are sent to the inputs of the following instructions according to the graph of the program. Once used tokens are destroyed.
Unlike von-Neumann processor, dataflow processor has no program counter, and the matching store dynamically reveals instruction parallelism according to operand availability. It is possible to interleave matching store to many units so as many independent functional units can be loaded.
In this paper we show that considerably higher performance on vector and scalar codes could be reached in vector dataflow processor due to 2-3 times less number of executed instructions and the ability to parallelize the code on fine grains. High performance is confirmed by the results of simulation in the matrix multiplication and bubble sort programs.
Exploiting of fine-grained parallelism in vector dataflow processor is possible due to considerably wider window of execution, which holds 10 thousand instructions as compared with 100 instructions in modern superscalar processors.

Keywords

vector processor, dataflow architecture, fine-grained parallelism, scalar performance.

Library reference

Dikarev N.I., Shabanov B.M., Shmelev A.S. Utilization of fine-grained parallelism in dataflow processor // Problems of Perspective Micro- and Nanoelectronic Systems Development - 2016. Proceedings / edited by A. Stempkovsky, Moscow, IPPM RAS, 2016. Part 2. P. 144-150.

URL of paper

http://www.mes-conference.ru/data/year2016/pdf/D183.pdf