Preliminary design document due Friday, April 3, 6:30pm
Preliminary Demo by Friday, April 17, 6:30pm
Final Demo and Writeup by Friday, May 1 (last day of classes), 6:30pm
In this final lab of the semester, you will design and build a scalar five-stage pipelined processor for the LC4 ISA. Your pipeline will target frequency (of course) and IPC (via bypassing and branch prediction). As usual, I am providing you with skeleton code. The pipeline skeleton and all of the supporting modules are in this compressed tarball. Most of your code will be in the file lc4_pipe.v, whose current contents are given below. You may want to split out large sub-components of your pipeline, like the register file, ALU, and branch predictor, into seperate files.:
module lc4_pipe(CLK, RST, GWE,
IMEM_ADDR, IMEM_OUT,
DMEM_ADDR, DMEM_OUT, DMEM_IN, DMEM_WE,
// options
BPRED_ON, BYPASS_ON,
// debug interface
_TEST_W_PC,
_TEST_W_STALL,
_TEST_W_INSN,
_TEST_W_REGFILE_DATA_IN,
_TEST_W_REGFILE_WE,
_TEST_W_NZP_IN,
_TEST_W_NZP_WE,
_TEST_W_DMEM_ADDR,
_TEST_W_DMEM_IN,
_TEST_W_DMEM_WE,
_TEST_PMC_CYCLE,
_TEST_PMC_INSN,
_TEST_PMC_LOAD_STALL,
_TEST_PMC_BRANCH_STALL);
input CLK; // main clock
input RST; // global reset
input GWE; // global we for single-step clock
output [15:0] IMEM_ADDR; // instruction address
input [15:0] IMEM_OUT; // output from data memory
output DMEM_WE; // data memory write-enable
output [15:0] DMEM_ADDR; // data memory address
input [15:0] DMEM_OUT; // output from data memory
output [15:0] DMEM_IN; // input to data memory
input BPRED_ON; // branch prediction is on
input BYPASS_ON; // bypassing is on
output [15:0] _TEST_W_PC;
output [15:0] _TEST_W_INSN;
output [15:0] _TEST_W_REGFILE_DATA_IN;
output _TEST_W_REGFILE_WE;
output [2:0] _TEST_W_NZP_IN;
output _TEST_W_NZP_WE;
output [15:0] _TEST_W_DMEM_ADDR;
output [15:0] _TEST_W_DMEM_IN;
output _TEST_W_DMEM_WE;
output _TEST_W_STALL;
output [15:0] _TEST_PMC_CYCLE, _TEST_PMC_INSN, _TEST_PMC_LOAD_STALL, _TEST_PMC_BRANCH_STALL;
// YOUR CODE GOES HERE
always @(posedge CLK)
if (GWE)
begin
$display("--------------------------------------------------------------------------------");
$display("F:");
$display("D:");
$display("X:");
$display("M:");
$display("W:");
end
endmodule // lc4_pipe
The pipeline module interface is a superset of the single-cycle module interface. The CLK, RST, GWE and instruction and data memory interfaces are the same. However, there are two additional mode switches (BPRED_ON and BYPASS_ON), a bunch of output signals that start with _TEST_W_, and four output signals that start with _TEST_PMC_. The _TEST_ outputs will be used for debugging and test fixtures only. There is also skeleton behavioral code for displaying interior values. You can modify this code to create print-out snap-shots of pipeline that you can observe in ModelSim to help you debug.
Anyway, here are the basic specifications for the pipeline.
This processor will have a five-stage pipeline:
All instructions travel through all five stages. The _TEST_W_ outputs of the pipeline module should contain the corresponding values for the instruction currently in the Writeback (W) stage. Some of these values (PC, instruction bits, register inputs) are typically not needed at the W stage. You will have to propagate them through pipeline registers for debugging purposes. The _TEST_W_STALL signal should be 1 if the Writeback stage currently contains a bubble.
Branches are resolved in the execute stage, so a mispredicted branch has a two-cycle penalty. Your pipeline should be able to operate in two branch prediction modes: i) using "implicit" branch prediction where the predicted PC is the current PC + 1 and ii) using "explicit" branch prediction, specifically a tagged 8-entry branch target buffer (BTB). This mode is controlled by switch 7 on the daughter-board: "up" for BTB, "down" for no BTB. The mode switch is passed to the pipeline module via the signal BPRED_ON.
Your pipeline should also be able to operate in two bypassing modes: i) no bypassing, and ii) full bypassing including MX, WX, and WM value bypassing and MX and WX NZP bypassing (yes, the NZP bits have to be bypassed too). When bypassing is on, the only stalls are for load-to-use (this includes load to conditional branch). The bypassing mode is controlled by switch 6 on the daughter-board: "up" for bypassing, "down" for no bypassing. The mode switch is passed to the pipeline module via the signal BYPASS_ON.
Real processors have performance counters that track various events within a processor to help understand its performance. The pipelined LC4 processor also has performance counters. These are memory-mapped and there are four of them.
The cycle count is incremented every cycle. Every cycle one (and only one) of the instruction count, load stall, or branch stall counters is incremented. As such, the sum of these three registers should be equal to the cycle count.
Your processor should update the performance counters during the writeback stage. The performance counters should count an actual "NOOP" instruction as an instruction being executed. That is, it isn't either a branch stall or a load stall cycle. The counters reset to 0 only when the entire system is reset. The performance counters should also be hooked up to the testing interface via the _TEST_PMC_ buses.
Because the main board switches are difficult to get at, I have moved the debugging interface to the daughterboard switches (I kept reset and single-step hooked up to the board buttons). I have also used the larger number of switches to expand the debugging interface and you can expand it further if you want.
I modified PennSim to generate trace files for LC4 programs. You can generate a trace file for any program using the command line trace on <tracefilename>, then running and stopping the program, and then using the command line trace off. The trace file consists of a line of five 16-bit words for each instruction executed. These are:
The pipeline module we gave you has hooks to dump out various values associated with the instruction currently in the W stage. The new version of the ModelSim test fixture test_lc4_pipe.tf reads in a trace file and compares the values in the trace to the corresponding values of the instruction currently in the W stage. If any of the values mismatch, it will tell you what the mismatch is. By properly connecting the value of _TEST_W_STALL, the test fixture knows to ignore cycles in which no instruction is in the W stage. You can use this to help debug your pipeline. The trace corresponding to the harness4.hex file is harness4.trace.
Here are the timing.hex, timing.trace, and test_pl_timing.tf files for the timing test. Here is also the timing.asm file in case you want to look at the code and/or run it on PennSim.
You can use the same subset of Verilog as in lab1. You should pass all signals to modules by name (as opposed to by position).
There will be two demos.
All group members should be present at the demos. All group members should understand the entire design well enough to be able to answer any such questions.
There will also be two writeups.