|
Dataflow Mini-Graphs: A Complexity-Effective Application-Specific RISC/CISC Hybrid Dataflow mini-graphs are multi-instruction dataflow graphs with the external interfaces of singleton RISC instructions: atomic execution units with a maximum of two register inputs, one register output, one memory reference, and one control transfer. To create a mini-graph executable, a binary rewriting tool statically replaces frequently occurring dataflow graphs that satisfy mini-graph criteria with handles. A handle is a quasi-instruction that encodes the corresponding mini-graph's interface register dependences. In addition to these registers, a handle contains a reserved opcode and an immediate that identifies the mini-graph's definition, i.e., its constituent instructions and their dataflow. A mini-graph pipeline processes both unmodified and modified executables and treats handles as singleton instructions in all stages except execution. During execution, the processor invokes a handle-to-instruction sequence translation that is stored in an on-chip table called the mini-graph table (MGT). Essentially a microcode store, the MGT drives the cycle-by-cycle execution of the constituent mini-graph instructions. The MGT may be hardwired, but it is more useful to customize its contents to an application or an application domain. There are several candidate mechanisms for dynamic instruction set customization, including DISE. Mini-graphs amplify processor capacity and bandwidth by allowing the existing singleton-instruction machinery to process multiple instructions at once. Amplification is enabled by choosing only mini-graphs whose intermediate values are used only by subsequent mini-graph instructions, i.e., they are interior to the mini-graph. The mini-graph binary rewriter uses static degree-of-use analysis to identify these single-use values. The processor treats all mini-graph interior values as transient and does not allocate physical registers to them. This reduces register file size requirements and amplifies renaming, scheduling, register read, register write, and retirement bandwidths. The static compression of mini-graphs into handles naturally amplifies fetch bandwidth and instruction cache capacity. To prevent execution bandwidth from becoming a bottleneck (it is the only stage whose bandwidths mini-graphs do not naturally amplify), a mini-graph processor replaces some of its ALUs with ALU pipelines (single-entry, single-exit chains of ALUs) which add execution bandwidth without increasing scheduling or global bypass complexity. The simple, forward-only local operand network of the ALU pipeline enables a restricted, and externally transparent, form of dataflow-graph latency reduction. This research is partially supported by NSF CAREER award CCR-0238203 and by a grant from the Intel Research Council. People
Publications Serialization-Aware Mini-Graphs: Performance with Fewer Resources. (pdf)
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth. (pdf)
|