Demo and Writeup by 6pm on Friday, February 29th
This lab is to be done individually.
This lab is worth 25 points.
Important
There is a bug in the behavioral Verilog loop code I showed in class. Specifically, the loop as I showed it does not terminate. Oops. I updated the notes and reposted them. Please look at the updated lecture 2 notes. The relevant changes are on slides 15-16, but they have been propagated throughout. I also updated the slides on tasks, 23-24, in case people want to use those.
In this lab, you will construct the ALU (Arithmetic/Logical Unit) for a P37X ISA processor. Before you can build the ALU, you need to create a few building blocks (4-bit adder, 16-bit adder, 16-bit multiplier, 16-bit shifter) which you will then combine to form an ALU.
Before you begin, we have another tutorial for you to walk through: ModelSim simulation tutorial. This tutorial covers simulation of designs for verifying they are correct and debugging them when they are not.
Design and test the following combinational logic structures.
Important
Before you write Verilog code, first create a hand-drawn schematic diagram of the circuit with all wires and input/outputs labeled. Why? When designing hardware, even when using Verilog, you need to be thinking explicitly about the structure and interconnectedness of the circuits. Only when the diagram is complete should you write the Verilog code that corresponds to the circuit. As described below, you need to turn in both the hand-drawn schematic and a printout of the Verilog code.
Before creating a 16-bit adder, first create a signed 4-bit ripple-carry adder as a basic building block. It has three inputs: two 4-bit signed values and a 1-bit carry in signal. It has two output values: the 4-bit sum and a 1-bit carry-out.
Testing: Test the adder both in ModelSim and on the board. To test the adder on the board, hook your 4-bit adder inputs to two sets of four input switches on the extension board; hook the outputs to five LEDs on the extension board.
Demo: demo the 4-bit adder to the TA on the board.
Writeup: submit your schematic and verilog code.
The 16-bit adder takes in two 16-bit signed values and a single-bit carry-in signal. It has a 16-bit signed output and a single-bit carry-out. The interface for this adder should be module <adder_name> (input [15:0] a, b, input cin, output [15:0] s, ouput cout);
Implementation: For comparison purposes, create three different implementations:
Testing: Test the adders functionally in ModelSim.
Demo: The TA's will link your adder implementations with a test module and test them.
Writeup: Submit your hand-drawn schematics and verilog code.
Also, create a small table that compares the delay (in nanoseconds) and area (in terms of lookup tables or "LUTs") of these three adder implementations. To get these statistics, follow these steps. In the lab1.v module, instantiate one instance of the adder. In the sources tab in the project, expand the lab1.v tree, select the adder, right-click and choose 'Set Top Module'. Then in the processes tab, double-click first 'Synthesize-XST' and then 'Implement Design'. After implementation completes successfully, you can click on 'View Design Summary'. In the panel that appears, there will be several reports. The number of LUTs is in the 'Map Report'. You can compute the delay by looking at the 'Static Timing Report'. The delay of the adder is the maximum delay of any of the output bits. Compare these adders to Xilinx's intrinsic adder implementation. You can force Xilinx to synthesize an intrinsic adder by creating a module called adder_16b_int and making the body assign s = a + b;.
Important
Although your your eniac account will be the permanent home for your projects, it's a good idea to make a local copy of your current project on the desktop and work from that copy. When you are done for the day, you can copy the project back to your eniac account. The synthesis and implementation tools are slow enough when they run on the local disk. Don't make them twice as slow by forcing them to run over Samba.
Also, answer the following questions.
The 16-bit multiplier takes in two 16-bit signed values. It has a single 16-bit signed output. The interface for this multiplier should be module <multiplier_name> (input [15:0] a, b, output [15:0] p);. The multiplier is single-cycle and fully combinational (in contrast, a sequential multiplier takes multiple cycles and latches intermediate values).
Implementation: For comparison purposes, you will create six different multiplier implementations, three 'array' and three 'wallace-tree' style implementations. All six implementations have the same prologue. A 16-bit combinational multiplier is basically a collection of 15 16-bit adders that add shifted versions of the a input or 0 to each other; the bits in input b determine whether the given shifted version of a is added or 0 is added instead. The multiplexers and range-bit selectors that determine the 16 quantities that have to be added together is the same for all of the multiplier implementations, and so this code can be reused.
Important
Do not use the shifters described below within your multiplier. Shifting by a known value can be done easily and more efficiently using bit selection and and concatenation operations.
An 'array' multiplier is simply a chain of conventional adders. Your three array multiplier implementations should simply use different kinds of adders.
A 'Wallace-tree' multiplier also uses 15 adders, except that 14 of these are 'carry-save' adders and are arranged in a tree. The final adder is a conventional adder. See the CSE371 lecture notes for details on Wallace tree multipliers. Your three implementations should vary the implementation of the final adder.
Testing: Test the multipliers using ModelSim.
Demo: The TAs will test your code by linking it with a test file.
Writeup: You do not need to draw three seperate array multiplier and three seperate Wallace-tree multiplier schematics, one of each is fine.
Tabulate the area and delay of each of these multipliers and compare them to the area and delay of Xilinx's intrinsic multiplier implementation.
Answer the following questions:
The shifter unit has three inputs: a 16-bit value, a 4-bit shift amount, and a 2-bit shift type (00 is left shift, 01 is logical right shift, 10 is arithmetic right shift, 11 is no shift). It has a single 16-bit output.
Implementation: Note, there are several ways to implement this shifter. You could create three different shifters using 2-to-1 MUXes at each level. You would then use a 4-to-1 mux to select among them at the end. An alternative implementation would use four copies of the 4-to-1 MUX to select between the three kinds of shifts and no shift at all at each stage.
Testing: Test the shifters much like you tested the 16-bit adder. Use an additional two switches to specify the specific shift operation.
Demo: There is no separate demo for the shifter.
Writeup: Submit a schematic and your verilog code.
The ALU has three inputs: two 16-bit signed values and a 4-bit control signal that determines which operation the ALU should perform. The ALU has a single 16-bit signed output, which is the result of the operation. The interface for the ALU should be module p37x_alu (input [3:0] control, input [15:0] a, b, output [15:0] o); The ALU can perform ten operations:
| Description | Insn | Control |
|---|---|---|
| Addition | ADD | 0 100 |
| Subtraction | SUB | 0 101 |
| Multiplication | MUL | 0 110 |
| Bitwise or | OR | 1 000 |
| Bitwise not | NOT | 1 001 |
| Bitwise and | AND | 1 010 |
| Bitwise xor | XOR | 1 011 |
| Shift left logical | SLL | 1 100 |
| Shift right logical | SRL | 1 101 |
| Shift right arithmetic | SRA | 1 110 |
A few notes:
Implementation: The ALU should instantiate a single 16-bit adder (also used for subtract), a 16-bit multiplier, and a left/right shifter. Using the outputs from these modules and some combinational logic to generate all ten possible values. Finally, use a 16-to-1 multiplexer to select the correct signal. Use the fastest non-intrinsic multiplier and adder.
Answer the following questions:
This lab should be implemented using only low-level structural Verilog and the assign statement. You are not allowed to use the following Verilog operators: +, -, *, /, <<, >>, etc. However, you are allowed to use the following operators: ~, &, |, ^, ==, !=, ?:, {}, etc. If you're not sure if you're allowed to use a certain Verilog construct, just ask (post a message on the newsgroup, send an e-mail, etc.).
We'll be using an extension board that contains additional LEDs and switches. See lab1.v and lab1.ucf for a top-level Verilog module and mappings for the LED and switch pins.
Note
The switches on the extension boards are "active high", but (as described in the lab 0), the LEDs and the switches on the main board are "active low" signals.
Don't forget to walk through the ModelSim simulation tutorial before you begin.
In an effort to make the testing process slightly less painful, your friendly CSE 372 TA's have put together a test framework using behavioral Verilog. The testbench is available here: lab1_testbench.v. You can use it if you want to or you can write your own.
The code is straightforward: the Unit Under Test (UUT) is instantiated at the top. The testbench code basically just reads a series of values from a file (named, by default, lab1.input.test) to send as input to the UUT. Expected output values are also read from this file, and compared with the UUT's actual output.
To make the testbench work, make sure you have both the testbench verilog module and the test input file. Both can reside in the main directory of your Xilinx project, and they should work fine for both Xilinx and ModelSim.
Currently, the file lab1.input.test is pretty skimpy - you'll need to flesh it out with your own test cases. For example, you might want to make a testbench for each module or at least make input files to stress each part.
A couple of features:
Caveats: When a comment is read from the test input file, it causes the ALU control signal to dip to 0x0000 temporarily. This is usually not an issue, as 0x0000 is an invalid control signal, but if you write many lines of comments in succession, this can cause some bizarre timing issues.
Note
As part of your grade will be determined based on your lab writeups, they should be clear, concise and neat (preferably typed). You could have the greatest design in the world but if you cannot convey your idea clearly to the graders and convince them that it works you will not get good marks. Your lab writeups should include a brief explanation of what the circuits are supposed to do and how they do it.
Hand-drawn schematics. It is okay if these are a little messy, but they should accurately represent the Verilog code you turned in. Do not waste your time making pretty computerized schematics; the whole point of using Verilog is to save the tedium of making picture-perfect schematics. Be sure to label all input and output signals.
Verilog code. Your Verilog code should be well-formatted, easy to understand, and include comments where appropriate (for example, use a comment to describe all the inputs and outputs to your Verilog modules). Some part of the project grade will be dependent on the style and readability of your Verilog, including formatting, comments, good signal names, and proper use of hierarchy.