"Pre-Execution: Staying on the Performance Curve"

Amir Roth
University of Wisconsin-Madison Computer Science Department

Emerging applications continue to demand geometric increases in performance. Constraints like power, manufacturing cost, and the limits of physics itself mean that this performance will have to be obtained primarily by efficient extraction and exploitation of parallelism, rather than raw frequency or brute force solutions that provide only diminishing returns. Pre-execution is a technique that aggressively but efficiently finds parallelism in sequential codes-- codes that cannot be explicitly parallelized.

Pre-execution directly attacks the performance problems of sequential codes--loads and branches that are not handled by conventional caches and branch predictors. These "problem instructions" make up only 3% of the dynamic instruction stream, but are responsible for up to 60% of total execution time. Pre-execution identifies static branches and loads that cause the majority of problems, isolates their computations (backward-slices, dataflow-graphs), and "pre-executes" copies of these computations on separate threads. This decoupling effectively moves stalls induced by problem instructions from the main thread, whose performance is externally visible, to other threads, whose performance is not. The main thread maintains instruction throuput while auxiliary threads run ahead and "consume" the latencies of future problem instructions. When the main thread catches up to a pre-executed problem instruction, it sees it as already complete and easily avoids the stall previously associated with it. Pre-execution reduces sequential execution times by 20% and can be implemented with minimal additions to existing hardware.


Monday, February 19, 2001
Moore School Bldg. - Room #23
3:00 - 4:30 p.m.


Back to Colloquia Page