RENO (nee Register Integration)

RENO (REName Optimizer) is an extension to MIPS R10000 style register renaming that allows a single physical register to be shared as the output of multiple dynamic instructions. The sharing discipline is a slightly modified version of reference counting.

RENO uses physical register sharing to implement the dynamic counter-parts of some well known static compiler optimizations: move elimination, constant-folding, common sub-expression elimination, and register allocation. A static compiler is limited in its ability to apply these optimizations by: i) a limited number of logical registers, ii) compilation boundaries, iii) ambiguous or absent memory dependence information, and iv) the restriction that all transformations must be correct along all paths. In contrast, RENO: i) operates in the larger physical register namespace, ii) sees no compilation boundaries, iii) can use dynamically available or even speculative memory dependence information, and iv) must be correct only for the current dynamic path. Our experiments show that RENO can eliminate 23% of the dynamic instructions from a highly optimized program. An eliminated provides two kinds of benefits. First, its latency is removed from the execution dataflow graph. Second, its bandwidth and buffer consumption is removed from the out-of-order execution engine.

RENO started life as "register integration", a dynamic implementation of common-subexpression elimination and register allocation. These optimizations were individually introduced by others as Dynamic Instruction Reuse and Speculative Memory Bypassing. The unique feature of register integration is its implementation, i.e., the way in which it identifies redundancies and optimization opportunities. Unlike some previously proposed schemes, register integration performs this part using operational equivalence tests. A current instruction can reuse a physical register result written by a previous instruction, if it performs the same operation (i.e., it has the same opcode) on the same input physical registers. To facilitate this process, register integration stores (operation, input-physical-register1, input-physical-register2, output-physical-register) tuples of recent instructions in a table called the integration table. The primary advantage of register integration is that, unlike its predecessors, it implements reuse without reading or writing the latency-critical physical registers themselves. Only the pointers to the physical registers (which are much smaller in addition to being less performance critical) are manipulated. Register integration derives this advantage from the underlying MIPS R10000 implementation.

Amir Roth introduced register integration as a special purpose component of his implementation of pre-execution, speculative data-driven multithreading (DDMT). In that context, register integration was used for passing values from pre-execution threads (p-threads) to the main thread. A simple extension of this mechanism also implemented squash reuse.

At Penn, register integration was fleshed out as a standalone general-purpose reuse mechanism capable of justifying its own implementation. The current implementation of register integration supports squash and general reuse as well as speculative memory bypassing (a technique which was not previously considered a reuse variant). It also has efficient mechanisms for dealing with the rare (but existing) cases of mis-integration (false reuse). Under our current implementation, register integration provides dynamic reuse rates of 20% and above. These can be exploited either to improve performance or to reduce execution core complexity.

RENO is an extension to register integration that incorporates the previously proposed move elimination optimization and a new optimization: constant folding. The RENO implementation of constant folding targets register immediate additions and uses an extended map table definition (mapping logical registers to physical register/displacement pairs) and a limited and cheap form of operation fusion. Register immediate additions are the dataflow glue that enable load elimination. The main benefit of adding constant folding to RENO is the provision of a facility that can collapse these instructions without expensive table lookups.

Our recent work on RENO has focused on improving its efficiency: increasing the ratio of number of successful integrations to number of integration table accesses and integration table size. As it turns out, reuse is such a stable and predictable phenomenon that the entire process can effectively be statically "scripted". This script, which consists of assigning static instructions to integration table sets (which double as "attempt-to-integrate" directives), allows RENO to exploit more reuse with half the number of accesses and in a table that is one quarter the size. It also allows RENO to "pay for itself" in terms of energy.

This work is supported by NSF CAREER award CCR-0238203.



RENO: A Rename-Based Instruction Optimizer. (pdf)
Vlad Petric, Tingting Sha, and Amir Roth.
In proc. of ISCA-32, Jun. 6-8, 2005.

Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization. (pdf)
Amir Roth.
In proc. of ISCA-32, Jun. 6-8, 2005.

A Static Profile-Driven Management Scheme for Energy-Efficient Register Integration.
Anne Bracy, Vlad Petric, and Amir Roth.
Penn CIS Technical report: #MS-CIS-03-35, Nov. 2003.

Three Extensions to Register Integration. (pdf)
Vlad Petric, Anne Bracy, and Amir Roth.
In proc. of MICRO-35, Nov. 20-22, 2002.
Talk slides (these were created on MACOS X, the fonts look bad on Windows)

Squash Reuse via a Simplified Implementation of Register Integration.(pdf)
Amir Roth and Gurindar S. Sohi.
Journal of Instruction Level Parallelism (JILP), Vol. 3, 2002.

Register Integration: A Simple and Efficient Implementation of Squash Reuse. (pdf)
Amir Roth and Gurindar S. Sohi.
In proc. of MICRO-33, Dec. 10-13, 2000.
Talk slides