my email address: my last name at cis dot upenn dot edu

(215) 746-4223

Levine Hall 572
3330 Walnut Street

Philadelphia, PA 19104-3409

I work on making multiprocessors easier to program by leveraging changes in both computer architectures and parallel programming models.

I am looking for new PhD students interested in systems and computer architecture. If you are interested in these topics please apply to our PhD program and drop me an email as well.

Teaching

In Fall 2024 I'm teaching CIS 6010: GPGPU Programming and Architecture.

Students

I'm lucky to be working with the following great students:

Paul Biberstein (PhD)

Former students

Yuxuan Zhang (PhD)
Kelly Shiptoski (PhD 2023. First employment: Bolt Labs)
Omar Navarro Leija (PhD 2022. First employment: Bolt Labs)
Gautam Mohan (Master's 2020. First employment: Amazon)
Yuanfeng Peng (PhD 2019). First employment: Google
Nicholas Renner (Master's 2019, now a PhD student at NYU)
Nimit Singhania (PhD 2018, co-advised with Rajeev Alur). First employment: Google
Christian DeLozier (PhD 2018). First employment: Assistant Professor at United States Naval Academy
Kavya Lakshminarayanan (Master's 2018) First employment: Microsoft
Richard Zang (Master's 2018) First employment: Microsoft
Sana Kamboj (Master's 2017) First employment: Qualcomm
Ariel Eizenberg (Master's 2016) First employment: Government of Israel
Brooke Fugate (Master's 2015, co-advised with André DeHon)
Liang Luo (Master's 2015, then a PhD student at the University of Washington)
Akshitha Sriraman (Master's 2015, then a PhD student at the University of Michigan)

Recent Publications full list

Many of the paper links below use the ACM's Author-izer service, which tracks download statistics and provides a small kickback to various ACM Special Interest Groups for each download.

RPG²: Robust Profile-Guided Runtime Prefetch GenerationRPG²: Robust Profile-Guided Runtime Prefetch Generation
Yuxuan Zhang, Nathan Sobotka, Soyoon Park, Saba Jamilan, Tanvir Ahmed Khan, Baris Kasikci, Gilles Pokam, Heiner Litz and Joseph Devietti

International Conference on Architectural Support for Programming Languages & Operating Systems (ASPLOS '24), May 2024

[abstract][paper]

Data cache prefetching is a well-established optimization to overcome the limits of the cache hierarchy and keep the processor pipeline fed with data. In principle, accurate, well-timed prefetches can sidestep the majority of cache misses and dramatically improve performance. In practice, however, it is challenging to identify which data to prefetch and when to do so. In particular, data can be easily requested too early, causing eviction of useful data from the cache, or requested too late, failing to avoid cache misses. Competition for limited off-chip memory bandwidth must also be balanced between prefetches and a program's regular "demand" accesses. Due to these challenges, prefetching can both help and hurt performance, and the outcome can depend on program structure, decisions about what to prefetch and when to do it, and, as we demonstrate in a series of experiments, program input, processor microarchitecture, and their interaction as well.
To try to meet these challenges, we have designed the RPG² system for online prefetch injection and tuning. RPG² is a pure-software system that operates on running C/C++ programs, profiling them, injecting prefetch instructions, and then tuning those prefetches to maximize performance. Across dozens of inputs, we find that RPG² can provide speedups of up to 2.15×, comparable to the best profile-guided prefetching compilers, but can also respond when prefetching ends up being harmful and roll back to the original code - something that static compilers cannot. RPG² improves prefetching robustness by preserving its performance benefits, while avoiding slowdowns.
Online Code Layout Optimizations via OCOLOSOnline Code Layout Optimizations via OCOLOS
Yuxuan Zhang, Tanvir Ahmed Khan, Gilles Pokam, Baris Kasikci, Heiner Litz and Joseph Devietti

IEEE Micro, Vol. 43 No. 4, July 2023

[abstract][paper]