Pre-Execution and P-Thread Selection

Pre-execution is a paradigm for exploiting the additional execution contexts of multithreaded processors (like Intel's hyper-threaded Pentium4) to extract additional performance from what traditionally are thought of a single-threaded programs. Pre-execution removes stalls due to cache misses and mis-predicted branches by speculatively executing computations of likely-to-miss loads and likely-to-mispredict branches as additional threads, then passing the results back to the main thread. Professor Roth's PhD thesis described Speculative Data-Driven Multithreading (DDMT), one of the first proposed implementations of pre-execution.

At Penn, work on pre-execution has focused mostly on automating the process of pre-execution thread (p-thread) selection. This problem holds the key to extracting the maximum possible benefit from this technique. We have developed an analytical framework for selecting good p-threads from cache-miss-annotated dynamic program traces (or equivalently, path sensitive cache-miss profiles). The framework uses a linear (simplified but relatively accurate) formal benefit/cost model of pre-execution and dynamic programming to quickly analyze a bounded set of all possible p-threads and choose an "optimal" subset (optimal is in quotes, because optimality is defined with respect to the linear model, which only approximates the real world).

One of the directions we are currently exploring is the extension of this framework with energy considerations. Pre-execution trades redundancy (in other words, energy) for performance. With proper extensions to benefit/cost model, the framework can be retargeted to select p-threads that provide the greatest performance improvement at the lowest energy cost, or even p-threads that generate absolute reductions in system energy consumption.

We are also generalizing and finding analogs of this framework that can analyze and optimize other systems that use the "multithreading within a single-program" paradigm.

Amir Roth and this research is supported by NSF CAREER award CCR-0238203.



Energy Aspects of Pre-Execution and Energy-Aware P-Thread Selection. (pdf)
Vlad Petric and Amir Roth.
In proc. of ISCA-32, Jun. 6-8, 2005.

Energy Aware Pre-Execution and P-Thread Selection.
Vlad Petric and Amir Roth.
Penn CIS Technical Report #MS-CIS-03-34, Nov. 2003.
Slides from a talk given at Penn-Princeton Architecture meeting.

A Quantitative Framework for Automated Pre-Execution Thread Selection. (pdf)
Amir Roth and Gurindar S. Sohi.
In proc. of MICRO-35, Nov. 20-22, 2002.
Talk slides

Speculative Data-Driven Multithreading. (pdf)
Amir Roth and Gurindar S. Sohi.
In proc. of HPCA-7, Jan. 20-24, 2001.
Talk slides

Improving Virtual Function Call Target Prediction via Dependence-Based Pre-Computation. (pdf)
Amir Roth, Andreas Moshovos and Gurindar S. Sohi.
In proc. of ICS-99, Jun. 20-25, 1999.

Dependence Based Prefetching for Linked Data Structures. (pdf)
Amir Roth, Andreas Moshovos and Gurindar S. Sohi.
In proc. of ASPLOS-8, Oct. 4-7, 1998.