Lab 1 - Combinational Logic and ALUs

CSE 372 (Spring 2008): Computer Organization and Design Lab

Demo and Writeup by 6pm on Friday, February 29th

This lab is to be done individually.

This lab is worth 25 points.

Important

There is a bug in the behavioral Verilog loop code I showed in class. Specifically, the loop as I showed it does not terminate. Oops. I updated the notes and reposted them. Please look at the updated lecture 2 notes. The relevant changes are on slides 15-16, but they have been propagated throughout. I also updated the slides on tasks, 23-24, in case people want to use those.

Overview

In this lab, you will construct the ALU (Arithmetic/Logical Unit) for a P37X ISA processor. Before you can build the ALU, you need to create a few building blocks (4-bit adder, 16-bit adder, 16-bit multiplier, 16-bit shifter) which you will then combine to form an ALU.

Preliminaries

Before you begin, we have another tutorial for you to walk through: ModelSim simulation tutorial. This tutorial covers simulation of designs for verifying they are correct and debugging them when they are not.

Specifics

Design and test the following combinational logic structures.

Important

Before you write Verilog code, first create a hand-drawn schematic diagram of the circuit with all wires and input/outputs labeled. Why? When designing hardware, even when using Verilog, you need to be thinking explicitly about the structure and interconnectedness of the circuits. Only when the diagram is complete should you write the Verilog code that corresponds to the circuit. As described below, you need to turn in both the hand-drawn schematic and a printout of the Verilog code.

1. 4-bit Adder

Before creating a 16-bit adder, first create a signed 4-bit ripple-carry adder as a basic building block. It has three inputs: two 4-bit signed values and a 1-bit carry in signal. It has two output values: the 4-bit sum and a 1-bit carry-out.

Testing: Test the adder both in ModelSim and on the board. To test the adder on the board, hook your 4-bit adder inputs to two sets of four input switches on the extension board; hook the outputs to five LEDs on the extension board.

Demo: demo the 4-bit adder to the TA on the board.

Writeup: submit your schematic and verilog code.

2. 16-bit Adder

The 16-bit adder takes in two 16-bit signed values and a single-bit carry-in signal. It has a 16-bit signed output and a single-bit carry-out. The interface for this adder should be module <adder_name> (input [15:0] a, b, input cin, output [15:0] s, ouput cout);

Implementation: For comparison purposes, create three different implementations:

  1. Ripple-carry. Call this implementation adder_16b_rc.
  2. Carry-select with two 8-bit segments. See the CSE371 lecture notes for more information on carry-select adders. Call this implementation adder_16b_cs2.
  3. Carry-select with four 4-bit segments. Call this implementation adder_16b_cs4.

Testing: Test the adders functionally in ModelSim.

Demo: The TA's will link your adder implementations with a test module and test them.

Writeup: Submit your hand-drawn schematics and verilog code.

Also, create a small table that compares the delay (in nanoseconds) and area (in terms of lookup tables or "LUTs") of these three adder implementations. To get these statistics, follow these steps. In the lab1.v module, instantiate one instance of the adder. In the sources tab in the project, expand the lab1.v tree, select the adder, right-click and choose 'Set Top Module'. Then in the processes tab, double-click first 'Synthesize-XST' and then 'Implement Design'. After implementation completes successfully, you can click on 'View Design Summary'. In the panel that appears, there will be several reports. The number of LUTs is in the 'Map Report'. You can compute the delay by looking at the 'Static Timing Report'. The delay of the adder is the maximum delay of any of the output bits. Compare these adders to Xilinx's intrinsic adder implementation. You can force Xilinx to synthesize an intrinsic adder by creating a module called adder_16b_int and making the body assign s = a + b;.

Important

Although your your eniac account will be the permanent home for your projects, it's a good idea to make a local copy of your current project on the desktop and work from that copy. When you are done for the day, you can copy the project back to your eniac account. The synthesis and implementation tools are slow enough when they run on the local disk. Don't make them twice as slow by forcing them to run over Samba.

Also, answer the following questions.

  1. How many LUTs does the 16-bit ripple-carry adder require? Is this the number you expected? Why?
  2. How much faster is the fastest adder you implemented than the slowest adder? How much more area does it require?
  3. How much faster is the intrinsic adder than the ripple-carry adder? How many LUTs does it use? Given this information, how do you think the intrinsic adder is actually implemented?

3. 16-bit Multiplier

The 16-bit multiplier takes in two 16-bit signed values. It has a single 16-bit signed output. The interface for this multiplier should be module <multiplier_name> (input [15:0] a, b, output [15:0] p);. The multiplier is single-cycle and fully combinational (in contrast, a sequential multiplier takes multiple cycles and latches intermediate values).

Implementation: For comparison purposes, you will create six different multiplier implementations, three 'array' and three 'wallace-tree' style implementations. All six implementations have the same prologue. A 16-bit combinational multiplier is basically a collection of 15 16-bit adders that add shifted versions of the a input or 0 to each other; the bits in input b determine whether the given shifted version of a is added or 0 is added instead. The multiplexers and range-bit selectors that determine the 16 quantities that have to be added together is the same for all of the multiplier implementations, and so this code can be reused.

Important

Do not use the shifters described below within your multiplier. Shifting by a known value can be done easily and more efficiently using bit selection and and concatenation operations.

An 'array' multiplier is simply a chain of conventional adders. Your three array multiplier implementations should simply use different kinds of adders.

  1. The 16-bit ripple-carry adder from above. Call this implementation multiplier_16b_array_rc.
  2. The faster of the two carry-select adders from above. Call this implementation multiplier_16b_array_cs.
  3. The intrinsic adder. Call this implementation multiplier_16b_array_int.

A 'Wallace-tree' multiplier also uses 15 adders, except that 14 of these are 'carry-save' adders and are arranged in a tree. The final adder is a conventional adder. See the CSE371 lecture notes for details on Wallace tree multipliers. Your three implementations should vary the implementation of the final adder.

  1. The 16-bit ripple-carry adder. Call this implementation multiplier_16b_wallace_rc.
  2. The faster of the two carry-select adders from above. Call this implementation multiplier_16b_wallace_cs.
  3. The intrinsic adder. Call this implementation multiplier_16b_wallace_int.

Testing: Test the multipliers using ModelSim.

Demo: The TAs will test your code by linking it with a test file.

Writeup: You do not need to draw three seperate array multiplier and three seperate Wallace-tree multiplier schematics, one of each is fine.

Tabulate the area and delay of each of these multipliers and compare them to the area and delay of Xilinx's intrinsic multiplier implementation.

Answer the following questions:

  1. How many LUTs does the array multiplier with ripple-carry adders use? What is the ratio of LUTs used by this multiplier to the LUTs used by a single ripple-carry adder? Is this ratio higher or lower than you expected? What might explain the difference for this ratio?
  2. What is the delay of the array multiplier with ripple-carry adders? What is the ratio of this delay to the delay of a single ripple-carry adder? Is this ratio higher or lower than you expected? What can account for it?
  3. How many LUTs does the wallace tree multiplier with ripple-carry adders use? What is the ratio of LUTs used here to LUTs used by the array multiplier? Is this ratio higher or lower than you expected? Why?
  4. What is the delay of the wallace multiplier with ripple-carry adders? What is the ratio of this delay to the delay of the array multiplier with ripple carry adders? Is this ratio higher or lower than you expected? What can account for it?
  5. What is the relative delay of the array multiplier with carry-select adders to the array multiplier with ripple-carry adders? Is this what you expected? What can explain this?
  6. What is the relative delay of the wallace tree multiplier with carry-select adders to the wallace tree multiplier with ripple carry adders? Is this what you expected? What can explain this?
  7. What is the LUT utilization and delay of the intrinsic multiplier? What do these diagnostics suggest about how an intrinsic multiply is implemented? Is there anything surprising about this?

4. 16-bit Shifter

The shifter unit has three inputs: a 16-bit value, a 4-bit shift amount, and a 2-bit shift type (00 is left shift, 01 is logical right shift, 10 is arithmetic right shift, 11 is no shift). It has a single 16-bit output.

Implementation: Note, there are several ways to implement this shifter. You could create three different shifters using 2-to-1 MUXes at each level. You would then use a 4-to-1 mux to select among them at the end. An alternative implementation would use four copies of the 4-to-1 MUX to select between the three kinds of shifts and no shift at all at each stage.

Testing: Test the shifters much like you tested the 16-bit adder. Use an additional two switches to specify the specific shift operation.

Demo: There is no separate demo for the shifter.

Writeup: Submit a schematic and your verilog code.

5. ALU

The ALU has three inputs: two 16-bit signed values and a 4-bit control signal that determines which operation the ALU should perform. The ALU has a single 16-bit signed output, which is the result of the operation. The interface for the ALU should be module p37x_alu (input [3:0] control, input [15:0] a, b, output [15:0] o); The ALU can perform ten operations:

Description Insn Control
Addition ADD 0 100
Subtraction SUB 0 101
Multiplication MUL 0 110
Bitwise or OR 1 000
Bitwise not NOT 1 001
Bitwise and AND 1 010
Bitwise xor XOR 1 011
Shift left logical SLL 1 100
Shift right logical SRL 1 101
Shift right arithmetic SRA 1 110

A few notes:

  • The control signal corresponds directly to the encoding of the P37X ISA for the opcode 0000 and 0001. The first control bit is the last bit of the 4-bit opcode; the remaining three bits are the last three bits in the instruction.
  • If any control signal other those specified is given to the ALU, the ALU should set all 16 output bits to zero.
  • The NOT operation returns the logical inverse of the first ALU input, and it ignores the second input.
  • The SUB operation computes output = input1 - input2.
  • For the shift operations, input1 is the value to be shifted and the four lower bits of input2 determine the amount of the shift (0 to 15 binary digits).
  • The arithmetic shift right performs sign extension; in contrast, the logical shift right performs zero extension.

Implementation: The ALU should instantiate a single 16-bit adder (also used for subtract), a 16-bit multiplier, and a left/right shifter. Using the outputs from these modules and some combinational logic to generate all ten possible values. Finally, use a 16-to-1 multiplexer to select the correct signal. Use the fastest non-intrinsic multiplier and adder.

Answer the following questions:

  1. What is the LUT utilization and delay of the ALU?
  2. Which components are on the "critical path" of the ALU?
  3. What problems, if any, did you encounter while doing this lab?
  4. How many hours did it take you to complete this assignment? On a scale of 1 (least) to 5 (most), how difficult was this assignment?
  5. Which part of the lab was most difficult and time consuming?

Lab Logistics

Verilog Restrictions

This lab should be implemented using only low-level structural Verilog and the assign statement. You are not allowed to use the following Verilog operators: +, -, *, /, <<, >>, etc. However, you are allowed to use the following operators: ~, &, |, ^, ==, !=, ?:, {}, etc. If you're not sure if you're allowed to use a certain Verilog construct, just ask (post a message on the newsgroup, send an e-mail, etc.).

LEDs and Switches

We'll be using an extension board that contains additional LEDs and switches. See lab1.v and lab1.ucf for a top-level Verilog module and mappings for the LED and switch pins.

Note

The switches on the extension boards are "active high", but (as described in the lab 0), the LEDs and the switches on the main board are "active low" signals.

Simulation Tutorial

Don't forget to walk through the ModelSim simulation tutorial before you begin.

Another Way to Test Your Design

In an effort to make the testing process slightly less painful, your friendly CSE 372 TA's have put together a test framework using behavioral Verilog. The testbench is available here: lab1_testbench.v. You can use it if you want to or you can write your own.

The code is straightforward: the Unit Under Test (UUT) is instantiated at the top. The testbench code basically just reads a series of values from a file (named, by default, lab1.input.test) to send as input to the UUT. Expected output values are also read from this file, and compared with the UUT's actual output.

To make the testbench work, make sure you have both the testbench verilog module and the test input file. Both can reside in the main directory of your Xilinx project, and they should work fine for both Xilinx and ModelSim.

Currently, the file lab1.input.test is pretty skimpy - you'll need to flesh it out with your own test cases. For example, you might want to make a testbench for each module or at least make input files to stress each part.

A couple of features:

  • Lines in the test input file that begin with a "#" (sh) character are ignored as comments (but see "Caveats" below)
  • Currently, values read from the test input file (other than the control signal for the ALU) must be in decimal.

Caveats: When a comment is read from the test input file, it causes the ALU control signal to dip to 0x0000 temporarily. This is usually not an issue, as 0x0000 is an invalid control signal, but if you write many lines of comments in succession, this can cause some bizarre timing issues.

Note

As part of your grade will be determined based on your lab writeups, they should be clear, concise and neat (preferably typed). You could have the greatest design in the world but if you cannot convey your idea clearly to the graders and convince them that it works you will not get good marks. Your lab writeups should include a brief explanation of what the circuits are supposed to do and how they do it.

Hand-drawn schematics. It is okay if these are a little messy, but they should accurately represent the Verilog code you turned in. Do not waste your time making pretty computerized schematics; the whole point of using Verilog is to save the tedium of making picture-perfect schematics. Be sure to label all input and output signals.

Verilog code. Your Verilog code should be well-formatted, easy to understand, and include comments where appropriate (for example, use a comment to describe all the inputs and outputs to your Verilog modules). Some part of the project grade will be dependent on the style and readability of your Verilog, including formatting, comments, good signal names, and proper use of hierarchy.