Homework 5 - CIS501 Fall 2010

Instructor:Prof. Milo Martin
Due:Wed, November 24th (via blackboard)

Overview

The is the first part of a two-part assignment in which you will explore the effectiveness of out-of-order execution (dynamic scheduling) impact on instruction-level-parallelism. As a fully detailed simulation of such a core is beyond what could be feasibly done for an assignment, the model of the processor core described in this assignment is approximate in many ways (failing to capture many second-order effects) but captures the most important first-order effects of instruction execution.

This assignment uses the same traces as previous assignments (see Trace Format document).

As before, you'll write a program that reads in the trace and simulates different processor core configurations. You're welcome to write the program in whatever language you like; although we are going to ask for you to turn in your source code (via blackboard), to end result of this assignment are the experimental results (the program source code is just a means to the end).

Two-Part Assignment

This is the first (and smaller and easier) part of a two-part assignment. Part A is on register renaming (due before Thanksgiving break), and Part B is the dynamic scheduling and experimental part (due after Thanksgiving break). I had originally intended this to be a single large assignment, however, it is more programming that the previous assignments, so I wanted to break it into more manageable pieces.

Warning

This is the smaller part of this two assignments! By the time this one is due, to be on target, you should have already started on the next assignment.

Register Renaming

The first step is applying register renaming to the instruction stream. You will need two data structures for this:

The renaming algorithm:

For each valid source register:
  microop.phys_reg_source[i] = maptable[microop.arch_reg_source[i]]

For each valid destination register:
  microop.phys_reg_to_free[i] = maptable[microop.arch_reg_destination[i]]
  new_reg = freelist.dequeue()
  maptable[microop.arch_reg_destination[i]] = new_reg
  microop.phys_reg_destination[i] = new_reg

At commit:

for each valid destination register:
  freelist.enqueue(microop.phys_reg_to_free[i])

Handling flags. One wrinkle in dealing with x86 is that we need to handle the flags (condition codes). To avoid WAW and WAR hazards on the flags, you must rename them as well. To do this, we'll treat the flags like just another architectural register by assigning it architectural register 49. Any instruction the reads (or write) the flags is then set to read (or write) architectural register 49. Thus, each micro-op has up to three source registers and up to two destination, all of which must be renamed according to the above algorithm.

Test Cases

To help you debug your simulator, we've provided some renamed traces for you to compare your code with. Even without all the rest of the out-of-order machinery, you can generate a trace of the renamed instruction stream for a given number of physical registers. To generate a rename trace, instead of waiting to commit to free the registers, you can just free the register right after renaming it; the FIFO (first-in, first-out) nature of the queue will ensure the rename trace is the same.

The output rename trace has the following format:

43, r7 -> p49, r5 -> p7 [p5] | MOV LOAD
44, r5 -> p7, r44 -> p58 [p13], r49 -> p50 [p59] | CMP SUB_IMM
45, r49 -> p50 | J JMP_IMM
46, r45 -> p44 [p52] | CMP SAVE_PC

The first column is the micro-op count (the line number from the trace). This is followed by pairs of architectural registers and their mapping to physical registers. The destination registers also include the register that should be freed when they commit in square brackets. If an instruction doesn't have a valid architectural register, it doesn't display anything for that register. The register mappings are followed by the pipe ("|") symbol and then the macro-op and micro-op descriptors.

The renamed instruction trace for the gcc-1K.trace.gz for 60 and 512 registers, respectively:

What to Turn In

As this is just the first part of a two-part assignment, there aren't any experimental results to report yet or questions to answer. Please turn in:

Hints & Comments


Addendum