Home         Authors   Papers   Year of conference   Themes   Organizations        To MES conference

Optimizing the prefetch mechanism in the secondary cache memory  

Authors
 Aryashev S.I.
 Bychkov K.S.
Date of publication
 2016

Abstract
 The memory subsystem is one of the key elements that determine the efficiency of modern microprocessor systems. Often it is the memory subsystem is a "bottleneck" of the system in terms of performance [1-4]. Share commands to load and store data up to 30% in the standard user program. Programs that perform arithmetic coprocessors require loading and storing of larger amounts of data. This leads to a situation, in which to ensure the smooth operation of the system with high-performance superscalar processor core, that performs two or more instructions in a single clock cycle, you must provide a reference to the memory subsystem on each pipeline stage.
An important role in ensuring a high level of memory performance played by the organization of information exchange between the different levels of the hierarchy. An important part of this is the buffering memory requests.
This article discusses the hardware prefetch techniques used in high-performance RISC-microprocessor.
Four types of buffers for different operations implemented in the L2 cache in the developed NIISI RAS microprocessor [5]. One of them is the prefetch data buffer to L2 cache. Prefetch data buffer is used for pre-subtraction data block (cache line = 256 bits) of RAM.
Two registry, keeping the addresses of the last two requests to the buffer according to the FIFO principle, are added to the buffer design. When an incoming request compares the input address with the addresses of the previous requests and incrementing the appropriate counter (“up” or “down”). Initially prefetching is performed in ascending order of addresses, and
when “down”-counter value is equal to two, multiplexer switching occurs, and transferred to the buffer output decrement input address. Accordingly, switching to increment the address is carried out at a "up"-counter value equal to two.
In assessing the performance of RTL-model memory subsystem have problems with existing programs [6]. This is due to the fact that the tests are intended to run on chips are ready to go to the RTL-model unacceptably long time.
The correctness of the RTL-operation model of the microprocessor with the new buffer was checked by running the database regression testing and random testing, as well as boot the Linux operating system and running user application tests.
To measure the effectiveness of the prefetch buffer was prepared a special set of tests built on the basis of LMBENCH test, measures the speed of copying data using the C language library GLIBC.
The gain in speed up the suggested buffer compared to the buffer of incrementation address (if present in the test with decreasing addresses copying operations) is from 20 to 65 percent.
This scheme does not require significant hardware expenses and does not lead to additional delays in the path issuing requests to read from RAM, which was the main constraints imposed on the design of the upgraded buffer.
Keywords
 microprocessor system, memory subsystem, cache memory, buffer memory, prefetching, stream buffer.
Library reference
 Aryashev S.I., Bychkov K.S. Optimizing the prefetch mechanism in the secondary cache memory // Problems of Perspective Micro- and Nanoelectronic Systems Development - 2016. Proceedings / edited by A. Stempkovsky, Moscow, IPPM RAS, 2016. Part 2. P. 274-279.
URL of paper
 http://www.mes-conference.ru/data/year2016/pdf/D144.pdf

Copyright © 2009-2024 IPPM RAS. All Rights Reserved.

Design of site: IPPM RAS