CSE 371/372
Lab 5

Due:  Monday, May 2.
You will do this lab in teams of two

In this final lab, you will take the processor you designed for Lab 4 and pipeline it. Twice. Once with no bypassing and once with full bypassing.

NOTE: ISA Change

Up until now, opcode 0 was the HALT instruction and opcode 1 was unused. Because nop's are very useful in pipelining (remember, you have to effectively insert a nop whenever you stall) and because it is very convenient to simulate nop insertion by simply resetting the appropriate pipeline latches to 0, I have decide to redefine opcode 0 as NOOP and opcode 1 as HALT. I have recompiled the simulator and assembler as well as all benchmarks to account for this change. For this lab, you also have to implement the LCTR and CCTR instructions.

Interface

The processor will have the same interface as it did in Lab 4. The notable difference is the values of the instruction and cycle counters. The cycle counter (counter 1) increments every cycle. The instruction counter increments when a non-NOOP instruction completes.

Structural Specifications and Hints

Again, there are not many structural specifications. You are free to implement the pipeline (and more importantly its control logic) as you wish. The only requirements are:

Assuming you have a working datapath from Lab 4, this lab should actually take you less time than Lab 4 did. Remember, a pipelined datapath and a single-cycle datapath use the same datapath controller. The pipelined datapath simply pipelines the necessary control signals using pipeline registers.

Here is how I would go about doing this lab. First, I would start with a working Lab 4 datapath and controller and add pipeline latches, including latches for the appropriate control signals. This very simple pipeline has no branch, stall, or bypass logic but it should be able to correctly handle instruction sequences that require no stalling or bypassing (i.e., sequences in which dependent instructions are not within 3 instructions of one another) and which have no branches. You can create a little instruction sequence like this to test this basic pipeline. HINT:To implement WD forwarding through the register file (i.e., writes in the first half of the cycle and reads in the second half), I found it useful to write into the register file (and the data memory) on the opposite clock edges on which the pipeline latches are written. Doing this simulates the register file and memory as being written half way between two latches and gets rid of some timing problems. Use the behavioral latches for this.

Next, add stall logic. For stall logic, create a component pipeline_control that takes the appropriate register names from the corresponding latches and outputs a single stall signal. You can use behavioral verilog in this component, too. Remember, with only single cycle operations (luckily, you do not have to implement multiply in your pipeline), the conditions for stalling are very simple. Look at the pipelining slides for hints on how to implement stalls.

Next, add support for taken branches. Here you have to recognize a taken branch, and do a flush. Look at the pipelining slides for hints on how to implement flushes. Now, you can add branches to your instruction sequenced to see if this functionality works.

Finally, add bypassing. Remember, bypassing is controlled by local logic, so you can add bypasses independently of one another. From homework 3, you should have an idea of how to write an instruction sequence to exercise a particular bypass. Use those.

The first part of this lab (pipeline with no bypassing) took me about 11 hours. The second part (adding bypassing) took about 1 hour.

Verilog Naming Conventions

I think you will find it convenient to use a pipeline-style naming convention in your verilog code. The convention I like to use is that each structure and wire is preceded by the letter of the pipeline stage in which it belongs (or both stages if it operates across two stages). For instance, I call instruction memory F_IMEM and the fetch stage PC bus F_pc. To latch the PC into the D stage and subsequently to the X stage (you need to do this to compute targets for PC-relative branches), I define D_pc and X_pc and then use latches reg_16_en_srs0 FD_pc (clk_, FD_rst, FD_wen, F_pc, D_pc); and reg_16_en_srs0 DX_pc (clk_, DX_rst, DX_wen, D_pc, X_pc);. Personally, this terminology helps keep the design straight in my head when coding.

Writeup

Your writeup for this lab should include labeled schematics for both pipelines (again, only for the pipeline itself, there is no need to turn in schematics for the interiors of the individual components like the latches, register file, ALU, barrel shifter, etc.) and your verilog code. The TAs will also ask you questions about your design.

You will demo your design on 7 programs for a total of 10 demos (the last three programs will be run both with and without bypassing), 10 points for each demo. The seven programs are:

Build functionality into your pipeline in stages to maximize the number of demos you can pass. For instance, if you get everything except bypassing working, you should be able to pass 7 of the 10 demos. If you get everything except for the new instructions working, you should be able to pass 8 of the 10.