Home         Authors   Papers   Year of conference   Themes   Organizations        To MES conference

Research ways to design a dynamic branch prediction unit for promising microprocessor development by SRISA RAS  

Authors
 Barskikh M.E.
Date of publication
 2016

Abstract
 The article describes the analysis of the dynamic branch prediction unit implementation options. A comparative study of the influence of circuit parameters on its accuracy has been demonstrated. We studied the effect of different schemes on the performance of microprocessors with RISC architecture to select the optimal implementation.
Branch and jump instructions in the program flow create a control dependence, which determining the order of instructions execution in the pipeline. Therefore, using of dynamic branch prediction in modern processors becomes a required. The most commonly used in commercial processors combined or majority schemes are based on a combination of basic bimodal and gShare branch predictions. Global branch history shift register (BHSR) often is implemented as 32-bit length to use earlier branch history in addressing of the branch history table (BHT).
Several areas of optimization have been allocated based on the analysis of known implementations. Optimization of dynamic prediction scheme in its development was carried out in selected areas and parameters. We evaluated the performance of the most frequently used schemes: bimodal, gShare, combined and majority; the effect of the memory capacity for storing of BHT-tables on the accuracy of the scheme; the impact of the global history register size.
The paper presents the analyze branch prediction unit accuracy and CPU performance. Investigation of instruction traces are not used, because it has low accuracy and does not provide information about the performance. Instead of we modified the processor RTL-model, which allows customizing branch prediction unit. The simulation used benchmarks (Coremark, Dhrystone and Whetstone); applied programs of matrix multiplication, fast Fourier transformations, archiving, compilation and sorting.
Simulation analysis showed that separate bimodal and gShare prediction schemes has the lowest accuracy, but they requiring only a single 8Kb memory bank. The majority and combination prediction schemes work much better with a memory requirement at 3 times more: they provide accuracy increase 3.5% and 4.7% respectively (comparing with the gShare scheme). However, at the same time, the IPC growth is only 1.3% and 2%. It was also analyzed the size of the memory effect on prediction accuracy. Memory size is limited from 16 to 4K entries by masking of most significant bit address. These data allow selecting branch prediction implementation depending on limitations on its scope. The best choice is a combination scheme, but we can be used the gShare scheme with a single 1Kb memory bank if necessary.
To simulate various memory addressing options, BHSR register was extended to 32 bits. The addresses of gShare and choice memories in the combined scheme is always formed XOR function between 12 lower PC bit and 12-bit global history, starting with the digits 0, 4, 8, 12, 16 and 20. For each point of these parameter intersections has been collected data about prediction accuracy, which presented in this work. They show that the accuracy of the scheme drops when oldest global history in addressing functions is used. The best accuracy was obtain with using PC to addressing the choice memory and XOR function with newest global history to addressing the gShare memory.
In addition to the absolute values of prediction accuracy and IPC for different memory addressing functions has been collected and analyzed the frequency of references to different BHT entries. The spread of these values allow evaluating the influence of the selected addressing function on interference: when various branch instructions that are in different locations and have different execution history, however, are mapped on the same BHT entry. The smaller it is the more uniform the memory used and the more stable prediction scheme (with the selected addressing) to changes in the executable program code.
To vindication of the choice of accuracy degradation, it was estimated addressing parameters on the best case and the reduction of interference with respect to it also. It is shown that despite a slight drop in accuracy we can achieve better stability of the scheme. In finally, the choice and gShare memories addressing scheme were selected based on the data obtained in the analysis to be used in CPU. Thus, the dynamic branch prediction scheme has been optimized for using in the processor 1890VM8.
Keywords
 superscalar microprocessor, dynamic branch prediction, combining scheme, majority scheme, branch prediction performance.
Library reference
 Barskikh M.E. Research ways to design a dynamic branch prediction unit for promising microprocessor development by SRISA RAS // Problems of Perspective Micro- and Nanoelectronic Systems Development - 2016. Proceedings / edited by A. Stempkovsky, Moscow, IPPM RAS, 2016. Part 2. P. 266-273.
URL of paper
 http://www.mes-conference.ru/data/year2016/pdf/D085.pdf

Copyright © 2009-2024 IPPM RAS. All Rights Reserved.

Design of site: IPPM RAS