Lab 2: Simple Fifo, FFT Pipeline and elasticity

Lab 2 due date: Monday, September 25, 2017, at 11:59:59 PM EDT.

Your deliverables for Lab 2 are:

Introduction

In this lab the goal is to build up different versions of the Fast Fourier Transform (FFT) module. This module is described in detail in "L0x", titled FFT: An example of complex combinational circuits, which was given in a previous version of this class. You'll find the presentation as a [pptx] or a [pdf].

But before the FFT, we will do a warm up with Fifos. Then you will implement an inelastic pipeline implementation of the FFT using registers between each stage. Finally, you will implement an elastic pipeline implementation of the FFT using FIFOs between each stage.

Fifo

The posted FFT presentation assumes that we have Fifos with guards. Guards on enq, deq, and first prevent the rules enclosing calls of these methods from firing if the guards on the methods are not met. Because of this assumption, the code in the presentation uses enq, deq, and first without checking if the FIFO is notFull or notEmpty.

The syntax for a guard on a method is shown below:

method Action myMethodName(Bit#(8) in) if (myGuardExpression);
    // method body
endmethod

myGuardExpression is an expression that is True if and only if it is valid to call myMethodName. If myMethodName is going to be used in a rule the next time it is fired, the rule will be blocked from executing until myGuardExpression is True.

Exercise 1 (10 Points): Completes the code in Fifo.bsv to implements a 3-elements fifo with properly guarded methods. Feel free to take inspiration from the class slides. The interface defined in Fifo.bsv tells you the type of the methods (enq, deq, first) that your module should define.

Data types

Multiple data types are provided to help with the FFT implementation. The default settings for the provided types describe an FFT implementation that works with an input vector of 64 different 64-bit complex numbers. The type for the 64-bit complex data is defined as ComplexData. FftPoints defines the number of complex numbers, FftIdx defines the data type required for accessing a point in the vector, NumStages defines the number of stages, StageIdx defines a data type to access a particular stage, and BflysPerStage defines the number of butterfly units in each stage. These type parameters are provided for your convenience, feel free to use any of these in your implementations.

It should be noted that the goal of this lab is not to understand the FFT algorithm, but rather to experiment with different control logics in a real-world application. The getTwiddle and permute functions are provided with the testbench for your convenience. However, their implementations are not strictly adhering to the FFT algorithm, and may even change later. It would be beneficial to focus not on the algorithm, but on changing the control logic of a given datapath in order to enhance its characteristics.

Butterfly unit

The module mkBfly4 implements a 4-way butterfly function which was discussed in the presentation. This module should be instantiated exactly as many times as you use it in your code.

interface Bfly4;
    method Vector#(4,ComplexData) bfly4(Vector#(4,ComplexData) t, Vector#(4,ComplexData) x);
endinterface

module mkBfly4(Bfly4);
    method Vector#(4,ComplexData) bfly4(Vector#(4,ComplexData) t, Vector#(4,ComplexData) x);
        // Method body
    endmethod
endmodule

Different implementations of the FFT

You will be implementing modules corresponding to the following FFT interface:

interface Fft;
    method Action enq(Vector#(FftPoints, ComplexData) in);
    method ActionValue#(Vector#(FftPoints, ComplexData)) deq();
endinterface

The modules mkFftCombinational, mkFftInelasticPipeline, and mkFftElasticPipeline should all implement a 64-point FFT which is functionally equivalent to the combinational model. The module mkFftCombinational is given to you. Your job is to implement the other 2 modules, and demonstrate their correctness using the provided combinational implementation as a benchmark.

Each of the modules contain two FIFOs, inFifo and outFifo, which contain the input complex vector and the output complex vector respectively, as shown below.

module mkFftCombinational(Fft);
    FIFOF#(Vector#(FftPoints, ComplexData)) inFifo <- mkFIFOF;
    FIFOF#(Vector#(FftPoints, ComplexData)) outFifo <- mkFIFOF;
   ...

These FIFOs are conflict-free FIFOs with guards added in exercise one.

Each module also contains a Vector or multiple Vectors of mkBfly4, as shown below.

Vector#(3, Vector#(16, Bfly4)) bfly <- replicateM(mkBfly4);

The doFft rule should dequeue an input from inFifo, perform the FFT algorithm, and finally enqueue the result into outFifo. This rule will usually require other functions and modules to function correctly. The elastic pipeline implementation will require multiple rules.

   ...
    rule doFft;
        // Rule body
    endrule
   ...

The Fft interface provides methods to send data to the FFT module and receive data from it. The interface only enqueues into inFifo and dequeues from outFifo.

   ...
    method Action enq(Vector#(FftPoints, ComplexData) in);
        inFifo.enq(in);
    endmethod

    method ActionValue#(Vector#(FftPoints, ComplexData)) deq;
        outFifo.deq;
        return outFifo.first;
    endmethod
endmodule 

Exercise 2 (5 Points): In mkFftInelasticPipeline, create an inelastic pipeline FFT implementation. You can look here for some extra slides on inelastic pipelining, compared to what we covered in class. This implementation should make use of 48 butterflies and 2 large registers, each carrying 64 complex numbers. The latency of this pipelined unit must also be exactly 3 cycles, though its throughput would be 1 FFT operation every cycle.

The Makefile can be used to build simInelastic to test this implementation. Compile and run using

$ make inelastic
$ ./simInelastic

Exercise 3 (10 Points):

In mkFftElasticPipeline, create an elastic pipeline FFT implementation. This implementation should make use of 48 butterflies and some fifos. You should also try to use:

  1. your 3 elements FIFOs (instantiated with mkFifo).
  2. the provided FIFOs (instantiated with mkCFFifo).
The stages between the FIFOs should be in their own rules that can fire independently. When running the testbench you are likely to observe an important difference in performance (number of cycle it took to run the testbench). That is expected, please email me if you don't.

The Makefile can be used to build simElastic to test this implementation. Compile and run using

$ make elastic
$ ./simElastic

Discussion Questions

Write your answer to this question in the text file discussion.txt provided in the lab repository.

Discussion Question 1:

What is the source of the performance gap between your two elastic implementations (when it is using the class fifo and when it is using your own fifo)? (2 Points)

Discussion Question 2:

Assume you are given a black box module that performs a 10-stage algorithm. You can not look at its internal implementation, but you can test this module by giving it data and looking at the output of the module. You have been told that it is implemented as one of the structures covered in this lab, but you do not know which one. How can you tell if it is inelastic or if it is elastic? (2 Points)

Discussion Question 3 (Optional): How long did you take to work on this lab?

When you have completed all the exercises and your code works, commit your changes to the repository, and push your changes back to the source.