Lab 3 - Non-Pipelined Processor

CSE 372 (Spring 2006): Digital Systems Organization and Design Lab

Worksheet due Monday, March 20

Simulation Demo by Friday, March 24

Hardware Demo by Friday, March 31

Writeup due before class Monday, April 3

This lab is to be done in pairs (groups of two).

This lab is worth 25 points.

Overview

In this lab, you will construct a non-pipelined processor for the P37X ISA.

Note

The final lab of the semester, lab4, is substantially more time consuming than this lab. Thus, even though you have three weeks to complete this lab, your goal should be to complete it in two weeks such that you have plenty of time for the next lab.

We've changed the P37X ISA just slightly (changed the encoding the NOOP instruction), so be sure to use the version of the P37X ISA document linked from this page.

Common Module Library

New Register Module

We're giving you a slightly updated register module. Please use this single register module unmodified in your design. This register should be the only state element you use anywhere in your design (aside from the main memory module we're giving you).

This new register module has three new features:

  • A 1-unit delay on the register outputs to ensure the functional simulation works correctly.
  • The new module has a "global write enable" (gwe) signal to enable the single-stepping described in the tutorial. Just as clk and rst are passed into your modules unmodified, the same should be true for gwe as well.
  • A parameterized reset value (the second parameter of the module).

Note that this module has a slightly different interface from the register given to you last time. The order of inputs/outputs has been changed (for no good reason) and the gwe (global write enable) signal was added:

module Nbit_reg(in, out, clk, we, gwe, rst);
   parameter n = 1;
   parameter r = 0;
 
   output [n-1:0] out;
   input [n-1:0]  in;   
   input          clk;
   input          we;
   input          gwe;
   input          rst;      

   reg [n-1:0] state;

   assign #(1) out = state;

   always @(posedge clk) 
     begin 
       if (rst) 
         state = r;
       else if (we & gwe) 
         state = in; 
     end
endmodule

Note

The above code is included in the p37x_processor_skeleton.zip archive in he file register.v. I suggest you use this register to avoid duplication of code.

Multiplexors

Create a set of parameterized muxes to instantiate explicit structural muxes in the datapath. You'll need N-bit muxes that are 2-to-1 (datapath), 4-to-1 (datapath), and 8-to-1 (in the register file).

For example:

module mux_4to1_N(out, sel, a, b, c, d);
   parameter n = 1;
 
   output [n-1:0] out;
   input [n-1:0]  a, b, c, d;
   input [1:0]    sel;

   assign         out = sel == 2'b00 ? a : 
                        sel == 2'b01 ? b :
                        sel == 2'b10 ? c : 
                        sel == 2'b11 ? d : {{n}{1'b0}} /*garbage*/;
endmodule

Note

Some multiplexors are included in the p37x_processor_skeleton.zip archive in the file mux.v. I suggest you use these muxes to avoid duplication of code.

The Datapath

As described in class, the non-pipelined datapath (the link points to a .pdf file of the datapath) contains the register file, memory, PC register, branch logic, 11 muxes and 2 write enable signals:

p37x_datapath.gif

In our implementation, the main datapath module was approximately 150 lines of Verilog.

ALU and Register File

The ALU from lab1 and the register file from lab2 are key components of the processor. However, you'll want to make a few minor modifications to both:

  • The register file should be modified to use the new register module
  • The ALU can optionally be modified to use the faster built-in + and * operators.

Branch Logic

The branch unit determines if a conditional branch should be taken or not-taken. It has two inputs: (1) a 16-bit signed value and (2) a three-bit "NZP" condition from the instruction (negative, zero, positive). The only output is a one-bit signal: 1 for taken branch, zero for not-taken (fall-through) branch. For example, if the N/Z/P bits are 110 and the data input value is negative or zero, then the output will be a one.

Internally, you'll want to create logic to determine if the 16-bit input value is (a) zero, (b) negative, or (c) positive (of which at most one will be true). This three-bit NZP value combined with the three-bit NZP bits from the instruction to generate the one-bit output. This branch logic can actually be encoded in just a few lines of Verilog. I suggest that you first determine if the input value is negative (hint: you can just look at a single bit of the value), zero (hint: use the reduction operator "|"), and positive (hint: a number is positive if it is not negative and not zero).

Note: you can use Verilog's == operator, but the < and > operators in Verilog assume that multi-bit values are unsigned, and thus won't work correctly on the signed input value.

Controller

The controller has two inputs: the 4-bit opcode and the 1-bit branch outcome from the branch logic. The outputs of the controller are all of the control signals for the 11 muxes and 2 write enable signals.

I suggest you write the Verilog for this module in two parts. The first part should decode all of the opcodes, one per line:

wire is_STR           = (opcode == 4'b1101);
...
wire is_ST            = (opcode == 4'b1111);

The second part can determine the actual output signals using these decoded values:

assign mem_we = (is_STR | is_ST);

Using this basic format, the controller module body should be much less than 100 lines of Verilog.

Program Counter

The program counter is just a 16-bit register. The initial value of this register should be 512 (hex 0x0200), which is the first memory location after the trap and interrupt tables. This initial value can be set via the new parameterized register.

Multiplexors

Use the multiplexor modules described above to explicitly instantiate the structural multiplexors. Recall that sign extention and zero extention can be done easily in Verilog using the "repeat signal" and "concatenate signal" operators.

Interface

The top-level processor (which is available as p37x_processor_skeleton.zip), instantiates the memory module, the processor module, all of the device code, generates the clock, etc. As such, the "memory" is not actually instantiated inside the datapath module. Instead, all of the memory module signals are inputs/outputs to the datapath module:

module sc_datapath(CLK, RST, GLOBAL_WE,
                   IMEM_ADDR, IMEM_OUT, 
                   DMEM_ADDR, DMEM_IN, DMEM_OUT, DMEM_WE,
                   REGFILE_WE, REGFILE_DATA_IN);

  input         CLK;         // main clock
  input         RST;         // global reset
  input         GLOBAL_WE;   // global we for single-step clock

  input [15:0]  IMEM_OUT;    // output from insn. memory
  input [15:0]  DMEM_OUT;    // output from data memory
  output [15:0] IMEM_ADDR;   // insn. memory address (i.e., current PC)
  output        DMEM_WE;     // data memory write-enable
  output [15:0] DMEM_ADDR;   // data memory address
  output [15:0] DMEM_IN;     // input to data memory

  output        REGFILE_WE;        // testbench/debugging signal
  output [15:0] REGFILE_DATA_IN;   // testbench/debugging signal

  ...

The last two output signals (REGFILE_WE and REGFILE_DATA_IN) are exported to allow the testbench to check for correct behavior and for supporting the debug mode.

Testing

As we did with previous labs, you'll use a behavioral testbench to test your processor. See a tutorial on the testbench.

You will also test your processor on hardware. See this hardware tutorial to do this.

Verilog Restrictions

This lab should be implemented using only low-level structural Verilog and the assign statement. However, in this lab, you are allowed to use the additional Verilog operators: +, -, *, /, <<, >>,. However, as before, you shouldn't use any of the behavioral Verilog constructs. If you're not sure if you're allowed to use a certain Verilog construct, just ask (post a message on the newsgroup, send an e-mail, etc.).

Demos

You'll have both a simulation demo and a hardware demo.

What to Turn In

Worksheet: You'll turn in the complete datapath worksheet in class on March 20th.

Final Writeup: You'll turn in the final writeup in class on April 3:

  1. Verilog code. As before, your Verilog code should be well-formatted, easy to understand, and include comments where appropriate (for example, use comments to describe all the inputs and outputs to your Verilog modules). Some part of the project grade will be dependent on the style and readability of your Verilog, including formatting, comments, good signal names, and proper use of hierarchy.
  2. In addition, answer the following questions:
  • Once you had the design working in simulation, did you encounter any problems getting it to run on the FPGA boards? If so, what problems did you encounter?
  • What other problems, if any, did you encounter while doing this lab?
  • How many hours did it take you to complete this assignment? On a scale of 1 (least) to 5 (most), how difficult was this assignment?

Honor Points

The basic design above does not support the privilege register (set by TRAP and un-set by RTT), the Memory Protection Register (MPR) that performs rudimentary memory protection, or any exceptions.

Recall from CSE240 that the MPR is a 16-bit register, in which each bit controls access to 1/16th of the memory. If the bit is a one, access is allowed to that segment. If the bit is a zero, no access is allowed. If the processor is in privileged mode, it ignores the MPR.

You can implement the privileged mode and MPR for honors points:

If you add some or all of this functionality and properly demonstrate it to the TAs to receive honors points. Note: it is up to you to convince the TAs that it is actually working, so you need to make your own test cases to convince them.

The final lab will have additional honors points that could be completed with the just the single-cycle datapath. So, just to get you thinking, here are some examples of honors credit for the next lab:

Addendum

[March 24] - Testbench v1.2 released - see testbench page for details.

[March 22] - Testbench v1.1 released, with much improved functionality - see testbench page for details.

[March 21] - Due to long synthesis times, breakout is the only program you need to demo on the boards.

[March 20] - All of the new files for the testbench, memory, and I/O devices in the zip archive p37x_processor_skeleton.zip.

[March 20] - A hardware tutorial has been added to describe how to run the processor on the board.