Home
Authors Papers Year of conference Themes Organizations To MES conference
Evaluation of the use of systolic arrays in the implementation of matrix multiplication algorithms on FPGAs |
|
|
|
|
Authors |
| Pasynkov S.V. |
| Iliasov R.F. |
Date of publication |
| 2021 |
DOI |
| 10.31114/2078-7707-2021-3-76-80 |
|
Abstract |
| Matrix multiplication is the fundamental building block of many algorithms, such as data analysis and neural networks. Over the past decades, systolic arrays have proven themselves as the optimal solution for using data with high efficiency, and recently there has been an increased interest in them. This article compares the multiplication operations of 2 designs for multiplying matrices using a systolic array and without using it, as well as using embedded multiplication blocks (DSP blocks). This article aims to determine the most optimal method for multiplying two matrices, depending on the maximum possible clock frequency and FPGA resources for implementing the design. To produce the measurements, we implemented two designs for matrix multiplication using a systolic array. In addition to these, we implemented two designs in the Verilog hardware description language for multiplying matrices 3 by 3 and 10 by 10 without using systolic arrays. Cyclone IV devices also include a combination of built-in resources that help improve performance, known as dedicated digital signal processing (DSP) blocks. Therefore, in addition to the abovementioned matrix multiplication designs on FPGA devices, we implemented a design for matrix multiplication using built-in multiplication blocks. All measurements were performed in the Quartus Prime program with the selected Cyclone IV E: EP4CE115F29I8L board, since more than 110,000 FPGA logic elements were needed to simulate the design. In this article, virtual pins were used to compile all the designs. Owing to this, we showed the maximum possible frequency of calculation modules, regardless of the variations of the automatic connection of pins in Quartus. The comparison of all multiplication implementations was done by using the embedded functions of the Quartus Prime program. For all the designs, we selected " Balanced (Normal flow)” optimization mode in the Quartus program. The data on the maximum frequency from the graph Slow 1000mV 100C were entered in the table. The results of this work have shown that the systolic array algorithm is still an optimal and highly efficient solution for use in linear algebra applications and usage of systolic arrays can improve the execution of multiplication operations and reduce the number of required elements for implementing a design in the Verilog hardware description language. |
Keywords |
| Verilog, multiplication, matrices, systolic array, DSP block, embedded multiplication block. |
Library reference |
| Pasynkov S.V., Iliasov R.F. Evaluation of the use of systolic arrays in the implementation of matrix multiplication algorithms on FPGAs // Problems of Perspective Micro- and Nanoelectronic Systems Development - 2021. Issue 3. P. 76-80. doi:10.31114/2078-7707-2021-3-76-80 |
URL of paper |
| http://www.mes-conference.ru/data/year2021/pdf/D032.pdf |
|
|