instructor: Joe Devietti
when: Monday/Wednesday 12-1:30pm
where: Towne 305
contact: email, canvas
office hours:
Graphics Processing Units (GPUs) have become extremely popular and are used to accelerate an increasingly diverse set of non-graphics workloads. This seminar will examine modern GPU architectures, the programming models used to write general-purpose code for GPUs, and the complexities of programming such highly parallel architectures. There will be a special emphasis on concurrency correctness issues as they relate to GPUs, including GPU memory consistency models and GPU concurrency bugs. Graduate-level coursework in computer architecture (e.g., CIS 5710) will be very helpful.
No textbooks are required; links to all readings will be provided at this website.
Big thanks to Lambda.ai for donating GPU resources to our class!
There will be no exams.
Submit homework via Canvas.
The class project can be done in groups of up to 3. The project is open-ended: it should be something related to GPUs but the specifics are up to you. Choosing a project that incorporates your interests (research or otherwise) is a great idea! Here are some project ideas:
This schedule is subject to change
| Date | Topic | Presenter |
|---|---|---|
| Wed 27 Aug | Intro | Joe |
| Mon 1 Sep | no class - Labor Day | |
| Wed 3 Sep | General-Purpose Graphics Processor Architectures (accessible via Penn VPN, also on “Files” section of Canvas), Chapters 1 & 2 | Joe |
| Mon 8 Sep | ” Sections 3.1 - 3.3 | Joe |
| Wed 10 Sep | ” Section 3.4 - 3.6 | Joe |
| Mon 15 Sep | ” Chapter 4 | Joe |
| Wed 17 Sep | Contemporary GPUs | Joe |
| Mon 22 Sep | CUDA Programming Guide | Joe |
| Wed 24 Sep | no class - Rosh Hoshanah | |
| Mon 29 Sep | GEMM and HW1 | Joe |
| Wed 1 Oct | CUDA topics, Roofline Model | Joe |
| Mon 6 Oct | A Primer on Memory Consistency and Cache Coherence, Chapters 3 (SC) | Joe |
| Wed 8 Oct | MCM Primer (Chapter 4, TSO) | Joe |
| Mon 13 Oct | MCM Primer (Chapter 5, XC) | Joe |
| Wed 15 Oct | PTX MCM A Formal Analysis of the NVIDIA PTX Memory Consistency Model | Joe |
| Mon 20 Oct | GPU concurrency: Weak Behaviours and Programming Assumptions slides | |
| Wed 22 Oct | no class - Joe traveling | |
| Mon 27 Oct | Cache Coherence for GPU Architectures | |
| Wed 29 Oct | Dynamic Warp Formation | |
| Mon 3 Nov | Cache-Conscious Wavefront Scheduling | |
| Wed 5 Nov | GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction | |
| Mon 10 Nov | Understanding The Security of Discrete GPUs | |
| Wed 12 Nov | GPU Memory Exploitation for Fun and Profit | |
| Mon 17 Nov | In-Datacenter Performance Analysis of a Tensor Processing Unit | |
| Wed 19 Nov | Flash Attention 2 | |
| Mon 24 Nov | gpucc: An Open-Source GPGPU Compiler | |
| Wed 26 Nov | no class - Thanksgiving | |
| Mon 1 Dec | Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling | |
| Wed 3 Dec | Project Presentations | |
| Mon 8 Dec | Project Presentations |