CSE 371/372 Spring 2005
Homework 3

Due: Thursday, February 24 in the folder outside Cheryl Hickey's office (Levine 502) by noon (or at the beginning of class).

Datapath and Control

1. (5 points) The datapath we went over in class supports arithmetic, arithmetic immeidate, load, store, conditional branch, and unconditional jump instructions. For this exercise, you will modify the datapath to support the MIPS instruction jal (jump and link). You can look up the semantics of this instruction in Appendix A in the book. Describe the necessary changes. Show and label any new structures, buses, muxes, and control signals, or any changes to current muxes, or control signals. You may either print this datapath and add the changes to it, or draw your own.

2. (5 points) Repeat exercise 1 for the slti instruction.

3. (5 points) Repeat exercise 1 for the sll instruction.

4. (10 points) Modify the multi-cycle FSM (slides #57 and #58) to implement the three new instructions.

Pipelining

Consider the code sequence for the following loop.

for (i=0; i<100; i++)
       Z[i] = A*X[i] + Y[i];

Assuming that X, Y, Z, and A are initially in registers $1, $2, $3, $4, the assembly code might look something like this.

       sub $5, $5, $5
loop:  slti $6, $5, 100
       bnez $6, exit
       lw $7, 0($1)
       mul $8, $7, $4
       lw $9, 0($2)
       add $10, $8, $9
       sw $10, 0($3)
       addi $1, $1, 4
       addi $2, $2, 4
       addi $3, $3, 4
       addi $5, $5, 1
       j loop
exit:

5. (5 points) Assuming a 5-stage pipeline and a multiply instruction that takes 5 cycles but does not have to go through its own memory stage, draw the pipeline diagram for a single iteration of this loop. You may assume that all branches and jumps are predicted correctly. The latency of an iteration can be defined as the time in cycles between the writeback stage of the first instruction in iteration i and the writeback stage of the first instruction in iteration i+1. What is the latency of one iteration in this case? What is the latency of the entire loop?

6. (5 points) Repeat the exercise but for a pipeline with only MX bypassing.

7. (5 points) Repeat the exercise but for a pipeline with full bypassing.

8. (5 points) Reschedule the code to eliminate some of the data hazards in a single iteration. Redraw the diagram and calculate the iteration and full loop latency on a pipeline with full bypassing.

9. (10 points) Pages 438 and 439 of the textbook describe a scheduling technique called loop unrolling which effectively fuses loop iterations together to create a bigger scheduling scope. A 2-unrolled version of our loop would be something like:

for (i=0; i<100; i+=2) {
       Z[i] = A*X[i] + Y[i];
       Z[i+1] = A*X[i+1] + Y[i+1]; }

Write the assembly code for the unrolled loop and schedule it to reduce data hazards. Draw the pipeline diagram on a pipeline with full bypassing. What is the new latency of one iteration? What is the latency of the entire loop?

R372 Simulator Programming

9. (30 points) For this assignment, you will be required to modify the R372 simulator, rather than write an assembly language program for it. You will modify the file simulator.c to simulate the delays of a 5-stage MIPS-style pipeline with different bypassing configurations. The simulator has two counter registers: counters[0] is the instruction counter, and counters[1] is the cycle counter. Your job is to modify the code such that the cycle counter reflects the actual number of cycles the program would take to execute on a pipeline with a given configuration.

The easiest way to do this is probably to compute the number of stall cycles each instruction experiences. Since this is a simple "in-order" pipeline, individual instruction stalls add. So, as each instruction is simulated, counters[1] is incremented by 1 plus any number of stall cycles. You can compute the stall cycles by checking the registers the current instruction reads against the registers the previous instructions write. Look for the variables ri1, ri2, ro, ro_1, and ro_2 for clues.

The simulator has a slightly new interface which you can learn about by simply running it with no arguments. Basically, the filename is now the first argument and all other arguments are optional. The old argument -[b|x] has been replaced with an optional argument [-b] (the default is hexadecimal interface). The old argument -[s|t] has been replaced with an optional argument [-t] (the default is silent execution). The three new optional arguments are -[mx], -[wx], and -[wm] which specify MX, WX, and WM bypassing, respectively. The default is no bypassing.

Make a copy of the directory~amir/cse371/r372/src/ to your local directory and make the modifications there. You may test your modifications on the multiply-divide assembly programs you wrote, but the only thing you must turn in for this assignment is an electronic copy of your modified simulator.c.