Nvidia Turing TU104 GPU die photo

CIS 6010: Special Topics in Computer Architecture: GPGPU Architecture and Programming Fall 2025

Course Information

instructor: Joe Devietti
when: Monday/Wednesday 12-1:30pm
where: Towne 305
contact: email, canvas

office hours:

  • by appointment

Course Description

Graphics Processing Units (GPUs) have become extremely popular and are used to accelerate an increasingly diverse set of non-graphics workloads. This seminar will examine modern GPU architectures, the programming models used to write general-purpose code for GPUs, and the complexities of programming such highly parallel architectures. There will be a special emphasis on concurrency correctness issues as they relate to GPUs, including GPU memory consistency models and GPU concurrency bugs. Graduate-level coursework in computer architecture (e.g., CIS 5710) will be very helpful.

Course Materials

No textbooks are required; links to all readings will be provided at this website.

Grading

  • Project: 50%
  • Participation: 30%
  • Assignments: 20%

There will be no exams.

Submit homework via Canvas.

The class project can be done in groups of up to 3. The project is open-ended: it should be something related to GPUs but the specifics are up to you. Choosing a project that incorporates your interests (research or otherwise) is a great idea! Here are some project ideas:

  • Rewrite your matrix multiply code from the homeworks to operate on 16-bit (“half”) floating point elements instead of 32-bit floats. Update the cuBLAS code correspondingly, and use Tensor Cores to accelerate your implementation.
  • Build a series of scalable locking implementations in CUDA, from simple spin-locks to something like MCS locks. The lack of coherence on GPUs should add an interesting wrinkle. Useful resources are Michael Scott’s webpage and the SSync library from EPFL.
  • Choose a GPU-related paper (e.g., one that we’ve read in class or another that piques your interest) that has source code available, and try to reproduce some of the results from it.
  • Pick a non-trivial open Github issue for an application written in CUDA, and try to resolve it. As a starting point, here are some popular GH repositories with CUDA code.
  • Port an application or algorithm of interest to you to CUDA, and benchmark its performance.

Course Schedule

This schedule is subject to change

Date Topic Presenter
Wed 27 Aug Intro Joe
Mon 1 Sep no class - Labor Day
Wed 3 Sep General-Purpose Graphics Processor Architectures (accessible via Penn VPN, also on “Files” section of Canvas), Chapters 1 & 2 Joe
Mon 8 Sep ” Sections 3.1 - 3.3 Joe
Wed 10 Sep ” Section 3.4 - 3.6 Joe
Mon 15 Sep ” Chapter 4 Joe
Wed 17 Sep Contemporary GPUs Joe
Mon 22 Sep CUDA Programming Guide Joe
Wed 24 Sep no class - Rosh Hoshanah
Mon 29 Sep GEMM and HW1 Joe
Wed 1 Oct CUDA topics, Roofline Model Joe
Mon 6 Oct A Primer on Memory Consistency and Cache Coherence, Chapters 3 (SC) Joe
Wed 8 Oct MCM Primer (Chapter 4, TSO) Joe
Mon 13 Oct MCM Primer (Chapter 5, XC) Joe
Wed 15 Oct PTX MCM A Formal Analysis of the NVIDIA PTX Memory Consistency Model Joe
Mon 20 Oct GPU concurrency: Weak Behaviours and Programming Assumptions slides
Wed 22 Oct no class - Joe traveling
Mon 27 Oct Cache Coherence for GPU Architectures
Wed 29 Oct Dynamic Warp Formation
Mon 3 Nov Cache-Conscious Wavefront Scheduling
Wed 5 Nov GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction
Mon 10 Nov Understanding The Security of Discrete GPUs
Wed 12 Nov GPU Memory Exploitation for Fun and Profit
Mon 17 Nov In-Datacenter Performance Analysis of a Tensor Processing Unit
Wed 19 Nov Flash Attention 2
Mon 24 Nov gpucc: An Open-Source GPGPU Compiler
Wed 26 Nov no class - Thanksgiving
Mon 1 Dec Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling
Wed 3 Dec Project Presentations
Mon 8 Dec Project Presentations