Homework 5 - CIS501 Fall 2010

Instructor:	Prof. Milo Martin
Due:	Wed, November 24th (via blackboard)

Overview

The is the first part of a two-part assignment in which you will explore the effectiveness of out-of-order execution (dynamic scheduling) impact on instruction-level-parallelism. As a fully detailed simulation of such a core is beyond what could be feasibly done for an assignment, the model of the processor core described in this assignment is approximate in many ways (failing to capture many second-order effects) but captures the most important first-order effects of instruction execution.

This assignment uses the same traces as previous assignments (see Trace Format document).

As before, you'll write a program that reads in the trace and simulates different processor core configurations. You're welcome to write the program in whatever language you like; although we are going to ask for you to turn in your source code (via blackboard), to end result of this assignment are the experimental results (the program source code is just a means to the end).

Two-Part Assignment

This is the first (and smaller and easier) part of a two-part assignment. Part A is on register renaming (due before Thanksgiving break), and Part B is the dynamic scheduling and experimental part (due after Thanksgiving break). I had originally intended this to be a single large assignment, however, it is more programming that the previous assignments, so I wanted to break it into more manageable pieces.

Warning

This is the smaller part of this two assignments! By the time this one is due, to be on target, you should have already started on the next assignment.

Register Renaming

The first step is applying register renaming to the instruction stream. You will need two data structures for this:

Map table: to map an architectural register to a physical register you'll need an array (or "vector") with one entry for each architectural register. Assume the set of architectural registers is from zero to 49 (so you'll need an array of size 50). At the start, initialize the table so that architectural register n is mapped to physical register n.
FIFO free list: a queue of physical register names. The operations needed to be supported are: (1) getting a free register from the head of the queue, (2) putting a newly deallocated register at the tail of the queue, and (3) querying the number of registers that are free. You may wish to use an existing queue data structure implementation (such as "ArrayDeque" in Java or "deque" in C++'s STL).

The renaming algorithm:

For each valid source register:
  microop.phys_reg_source[i] = maptable[microop.arch_reg_source[i]]

For each valid destination register:
  microop.phys_reg_to_free[i] = maptable[microop.arch_reg_destination[i]]
  new_reg = freelist.dequeue()
  maptable[microop.arch_reg_destination[i]] = new_reg
  microop.phys_reg_destination[i] = new_reg

At commit:

for each valid destination register:
  freelist.enqueue(microop.phys_reg_to_free[i])

Handling flags. One wrinkle in dealing with x86 is that we need to handle the flags (condition codes). To avoid WAW and WAR hazards on the flags, you must rename them as well. To do this, we'll treat the flags like just another architectural register by assigning it architectural register 49. Any instruction the reads (or write) the flags is then set to read (or write) architectural register 49. Thus, each micro-op has up to three source registers and up to two destination, all of which must be renamed according to the above algorithm.

Test Cases

To help you debug your simulator, we've provided some renamed traces for you to compare your code with. Even without all the rest of the out-of-order machinery, you can generate a trace of the renamed instruction stream for a given number of physical registers. To generate a rename trace, instead of waiting to commit to free the registers, you can just free the register right after renaming it; the FIFO (first-in, first-out) nature of the queue will ensure the rename trace is the same.

The output rename trace has the following format:

43, r7 -> p49, r5 -> p7 [p5] | MOV LOAD
44, r5 -> p7, r44 -> p58 [p13], r49 -> p50 [p59] | CMP SUB_IMM
45, r49 -> p50 | J JMP_IMM
46, r45 -> p44 [p52] | CMP SAVE_PC

The first column is the micro-op count (the line number from the trace). This is followed by pairs of architectural registers and their mapping to physical registers. The destination registers also include the register that should be freed when they commit in square brackets. If an instruction doesn't have a valid architectural register, it doesn't display anything for that register. The register mappings are followed by the pipe ("|") symbol and then the macro-op and micro-op descriptors.

The renamed instruction trace for the gcc-1K.trace.gz for 60 and 512 registers, respectively:

What to Turn In

As this is just the first part of a two-part assignment, there aren't any experimental results to report yet or questions to answer. Please turn in:

Renamed Instruction Trace. Via blackboard, turn in a "rename-128-regs.output" file that contains the renamed instruction trace for the gcc-1K.trace.gz trace with 128 total physical registers.
Source Code. Also submit all source code files via blackboard.

Hints & Comments

"MicroOp" class. When I wrote this code, I decided that a "MicroOp" struct or class needed to keep track of everything (and it is certainly needed for the next assignment). Thus, I recommend that you start this assignment by creating a struct or class to encapsulate the important state of each micro-op, including the data read in from the trace and also the physical registers and other information about the instruction as it is fetched, renamed, issued, executed, and commits.
Other Tips. See the previous assignments for hints about efficient code and computing resources.

Addendum

None, yet.