Picture of Andrew Hilton

Andrew D. Hilton

I finished my PhD at the University of Pennsylvania in summer of 2010, where I used to be a student in the Architecture and Compilers Group. While I was there, I was advised by Amir Roth. I currently work at IBM in Research Triangle Park, North Carolina as part of the performance modelling and analysis team. I also teach at Duke as an adjunct professor in the computer science department. Here is my cv. This page mostly has information about my activities in graduate school: my dissertation research, and the courses that I taught at UPenn.

My primary research interest in graduate school was in energy-efficient load latency tolerance. Loads which miss the data cache degrade performance in any type of core: in-order or out-of-order. When a cache miss occurs, the processor may be able to execute some independent instructions, but if the miss is sufficiently long, the processors critical execution resources fill up (the issue queue and physical registers in out-of-order, the pipeline latches in in-order), and the pipeline stalls. The miss need not go all the way to memory: in an in-order pipeline, even L2 hits stall the processor, and in an out-of-order pipeline L3 hits can cause stalls.

Load latency tolerance (LLT) addresses these scenarios. In LLT, cache misses and their dependent instructions are deferred, i.e., shunted aside, releasing their execution resources, allowing younger miss-independent instructions to execute. When the miss returns, the loads and their dependents are re-injected into the core, re-acquire resources and execute. LLT is compelling but initial designs were impractical and inefficient, requiring several new, bespoke structures and deferring and re-executing too many instructions to justify their performance gains.

My dissertation focused on re-formulating LLT in the context of existing micro-architectures, re-purposing existing structures while adding minimal complexity and overhead. I developed BOLT (Better Out-of-order Latency Tolerance) and iCFP (In-order Continual Flow Pipeline), out-of-order and in-order LLT designs that are complexity and energy efficient. BOLT and iCFP implement the re-execution of miss-dependent instructions as an alternative use of multi-threading hardware, maximizing re-use of existing structures. They avoid associative structures for load and store management, favoring simple, low-overhead indexed structures. Additionally, they employ simple pruning techniques to minimize re-execution overheads.

Dissertation:

Energy Efficient Latency Tolerance: Single-thread performance for the multi-core era. (slides)

Publications:

Flexible Register Management using Reference Counting Nominated for Best Paper
Steven Battle, Andrew Hilton, Mark Hempstead, and Amir Roth.
18th International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2012
BOLT: Energy-Efficient Out-of-Order Latency-Tolerant Execution
Andrew Hilton and Amir Roth.
16th International Symposium on High-Performance Computer Architecture (HPCA), Jan., 2010
CPROB: Checkpoint Processing with Opportunistic Minimal Recovery (slides)
Andrew Hilton, Neeraj Eswaran, and Amir Roth.
18th International Conference on Parallel Architectures and Compilation Techniques (PACT), Sep., 2009.
FIESTA: A Sample-Balanced Multi-Program Workload Methodology (slides)
Andrew Hilton, Neeraj Eswaran, and Amir Roth.
5th Workshop on Modeling, Benchmarking, and Simulation (MoBS), Jun., 2009.
Decoupled Store Completion/Silent Deterministic Replay: Enabling Scalable Data Memory for CPR/CFP Processors (slides)
Andrew Hilton and Amir Roth.
36th International Symposium on Computer Architecture (ISCA), Jun., 2009.
iCFP: Tolerating All-Level Cache Misses in In-Order Processors (slides) IEEE Micro's Top Picks 2010
Andrew Hilton, Santosh Nagarakatte and Amir Roth.
15th International Symposium on High-Performance Computer Architecture (HPCA), Feb., 2009.
Ginger: Control Independence Using Tag Rewriting (slides)
Andrew Hilton and Amir Roth.
34th International Symposium on Computer Architecture (ISCA), Jun. 9-13, 2007.
XChange: Coupling Parallel Applications in a dynamic environment
Hasan Abbasi, Matthew Wolf, Karsten Schwan, Greg Eisenhauer, and Andrew Hilton
6th International Conference on Cluster Computing, Sep. 20-23, 2004.

Teaching:

I was selected by the Center for Teaching and Learning as one of six recipients of their Graduate Fellowship for Teaching Excellence for 2008-09.

I taught the following classes:
Spring 2009: CS 173 (Introduction to Computer Science) at Ursinus College
Spring 2009: CS 274 (Computer Architecture and Organization) at Ursinus College
Spring 2008: CSE 399 (Special Topics: C++) at University of Pennsylvania
Spring 2007: CSE 399 (Special Topics: C++) at University of Pennsylvania

I TAed the following classes:
Spring 2006: TA for for Amir Roth's CSE371
Fall 2005: TA for Milo Martin's CIS501
Spring 2000-Spring 2003: TA for CS 1322 (Second intro to programming class, in Java) at Georgia Tech.

Other:

My WPE-II exam was on Hardware for Exploiting Data Level Parallelism in Spring 2006.
This exam involves reading 3-4 papers on a topic, providing a written analysis, and giving a roughly 45 minute talk.

Before coming to Penn, I was an undergraduate and then Masters student at Georgia Tech. My advisor there was Olin Shivers.

Contact Info:

Department of CIS
Levine Hall
3330 Walnut Street
Philadelphia, PA 19104-6389

Email: a d h i l t o n @ c i s . u p e n n. e d u