Designing on FPGA of high-speed finite state machines

Salauyou, V.V.

Home Authors Papers Year of conference Themes Organizations To MES conference

Designing on FPGA of high-speed finite state machines

Authors

Salauyou V.V.

Date of publication

2016

Abstract

A design method of high-speed finite state machines (FSMs) on LUT-based field programmable gate array (FPGA) by internal state splitting is offered. The method is aimed at practical usage. The method does not change FSM’s type (Mealy or Moore) [1], the method does not demand introduction of additional blocks or clock signals, and one can be easily included in a designing flow of digital systems on FPGA.
The many articles are devoted to problems of high-speed FSM design on FPGA. The paper [2] presents a technique for improving the performance of a synchronous circuit configured as a look-up table based FPGA without changing the initial circuit configuration; only the register location is altered. It improves clock speed and data throughput at the expense of latency. The article [3] present new methods und tools for state encoding and combinational synthesis of sequential circuits based on new criteria of information flow optimization. Together, they form a unified and complete pre-placement synthesis chain. In [4] the method of optimization of synchronization of difficult finite state machines is offered. The method is based on the concept of catalytic agents (catalyst) and adds the excess unit which includes part of the combinative circuit and several other registers to the circuit. As a result critical paths are divided into stages. The paper [4] propose a timing optimization technique for a complex FSM that consists of not only random logic but also data operators. The proposed technique, based on the concept of catalyst, adds a functionally redundant block (which includes a piece of combinational logic and several other registers) to the circuits under consideration so that the timing critical paths are divided into stages. In [5,6] styles of the FSM description in the VHDL language, and also the known methods of encoding of internal states for im-plementation of the fast FSMs are researched. The work [7] propose to use the evolutionary methodology to yield optimal evolvable hardware that implements the state machine control component. The evolved hardware requires a minimal hardware area and introduces a minimal propagation delay of the machine output signals. In [8] the task of FSM state assignment and optimization of the combinational circuit in case of implementation on CPLD of fast FSMs is considered. In [9] the new architecture of programmed logic which is specially intended for the implementation of FSMs based on transitions between statuses is provided (Transitionbased Reconfigurable FSM - TR-FSM). The new architecture allows to reduce the area of a chip, the time signal delay and a power consumption in comparison with the implementation of FSMs on FPGA. The paper [10] presents a study of performance of RAM-based im-plementations in FPGAs of FSMs. The influence of the FSM characteristics on speed and area has been studied, taking into account the particular features of different FPGA families, like the size of LUTs, the size of memory blocks, the number of embedded multiplexer levels and the specific decoding logic for distributed RAM. In [11] the implementation of FSMs with use of internal blocks of memory of FPGA is considered. Two architecture of finite state machines with multiplexers on inputs of memory blocks, which allow to reduce the area of a chip are offered and to increase high-speed performance of the FSM. In [12] the task of lowering of the number of arguments of FSM transition functions by decomposition of internal states of the FSM was considered. It is promotes both implementation reduction in cost, and increase of high-speed performance of FSMs in case of their implementation on FPGA.
Let A = {a1, …, aM} be the set of internal states, X = {x1, …, xL} be the set of input variables, Y = {y1, …, yN} be the set of output variables, and D = {d1, …, dR} be the set of transition functions of an FSM.
A one-hot state assignment is traditionally used for synthesis of high-speed FSMs on FPGAs. Thus to each internal state corresponds the separate flip-flop of FSM’s memory, which setting in 1 means that the FSM is in the given state. The input of each flip-flop is controlled by transition function di, di  D, i.e. to each internal state of the FSM corresponds own transition function di.
Let X(am,ai) be the set of the FSM input variables which values initiate the transition from the state am to the state ai (am, ai  A). To implement some transition from the state am to the state ai it is necessary to check the value of the flip-flop output for the active state am (one bit) and the input variable values of the set X(am,ai), which initiate the given transition. To implement the transition function di it is necessary to check the values of flip-flop outputs for all states from which transitions lead to the state ai, i.e. |B(ai)| values, where B(ai) is the set of states from which transitions terminate in the state ai, where |A| is the capacity of the set A. Besides, it is necessary to check the values of all input variables which initiate transitions to the state ai, i.e. |X(ai)| values, where X(ai) is the set of the input variables which values initiate transitions to the state ai.
Let ri (1) be a rank of the transition function di. Let n be a number of inputs of the functional generators LUT. If the rank ri for the some transition function di exceeds n inputs there is a necessity in a decomposition of the transition function di and its implementations on the several LUTs.
Note that by splitting of internal states it is impossible to lower the rank of the transition functions below the value r* (2). In proposed method the value r* is used as a upper bound of the ranks of the transition functions at splitting of FSM internal states.
It well-known two basic approaches at the decomposition of Boolean functions: sequential and parallel [1]. At the sequential decomposition all the functional generator LUTs sequentially are connected in a chain. The n arguments of the function di arrive on inputs of the first LUT, and the (n-1) arguments arrive on inputs of all remaining LUTs. Because the number of the LUT’s levels in case the sequential decomposition of the transition function di having the rank ri is defined from expression (3).
In case of parallel decomposition the functional generator LUTs incorporate in the form of the hierarchical tree structure. The values of function arguments arrive on inputs of the first level LUTs, and the values of the intermediate functions arrive on inputs of all next levels of the LUTs. Because the number of the LUT’s levels in case the parallel decomposition the transition function di having the rank ri is defined by expression (4)
It is difficult to predict what decomposition (sequential or parallel) used by the specific synthesizer. The preliminary researches showed that, for example, in packet Quartus II from Altera is simultaneously used both sequential, and parallel decomposition. The number li levels of the functional generators LUT at implementation on FPGA transition function di with the rank ri can is between values and .
Let us enter integer coefficient k, k  [0,10], which allows to adapt offered algorithm at determination of the number of the LUT’s levels for the specific synthesizer. In this case the number li of the LUT’s levels for implementation of the transition function di having the rank ri will be defined by expression (5).
The following problem in the offered approach is the answer to a question: when it is necessary to stop splitting of the FSM states? The matter is that at splitting of the some state ai except the increase of the number M of the FSM states the number of the transitions in the states of a set A(ai) also is growed, where A(ai) is the set of the states in which transitions from the state ai terminate. At splitting of the state ai the capacities of the sets B(am), am  A(ai), increase for the states of set A(ai). Therefore according to (1) for states of set A(ai) the ranks of the transition functions grow that can lead to increase of the values and , , and li.
In the proposed algorithm the process of state splitting stops, when the following condition is performance (6), where lmax is the number of the LUT levels which necessary for implementation of the most "bad" function having the maximum rank; lmid is the arithmetic mean value of number of the LUT levels for all transition functions. Note that in the splitting process of internal states the value lmid will increase, and the value lmax will decrease, therefore the algorithm execution always comes to the end.
The analysis of experimental researches shows that high-speed performance of the FSM as a result of application of the offered method increased for 5 families from 7. Thus for family MAX II high-speed performance increased at 1.52 time (or by 52 %), and for family Cyclone V high-speed performance increased at 1.35 time

Keywords

finite state machine, high-speed, state splitting, field programmable gate array, FPGA, SoC, look up table, LUT.

Library reference

Salauyou V.V. Designing on FPGA of high-speed finite state machines // Problems of Perspective Micro- and Nanoelectronic Systems Development - 2016. Proceedings / edited by A. Stempkovsky, Moscow, IPPM RAS, 2016. Part 1. P. 24-31.

URL of paper

http://www.mes-conference.ru/data/year2016/pdf/D024.pdf