CIS 505: Software Systems (Fall 2021)
This course provides an introduction to fundamental concepts of distributed systems,
and the design principles for building large-scale computational systems.
We will study some of the key building blocks – such as synchronization primitives, group communication
protocols, and replication techniques – that form the foundation of modern distributed systems, such as
cloud-computing platforms or the Internet. We will also look at some real-world examples of distributed
systems, such as GFS, MapReduce, Spark, and Dynamo, and we will gain some hands-on experience
with building and running distributed systems.
CIS 505 is one of the core courses
in the MSE program, and its final exam qualifies as one of the WPE-I exams in the PhD program.
Linh Thi Xuan Phan
Office hours: Wednesdays 1:00-2:00pm EDT
Location: OHQ and Zoom.
Time: Mondays/Wednesdays 10:15-11:45am
Location: Please see Piazza
(If you are on the waitlist and would like to attend the first few lectures, please email me.)
Office hour timetable for Fall 2021.
Office hours are held via OHQ and Zoom.
Distributed Systems: Principles and Paradigms, 3rd edition (by M. van Steen and A. Tanenbaum; ISBN 978-1543057386).
You can get a digital version of this book for free; hardcopies are available, e.g., from Amazon.
Additional material will be drawn from selected research publications.
Either undergraduate networking or operating systems is required. You should also be comfortable with programming in C/C++.
The course will involve three substantial programming assignments, a group project,
and two midterms.
Your letter grade will be based on the programming assignments (35%), the group project (35%), the
midterm exams (25%), and participation and quizzes (5%).
We will be using Piazza for all course-related discussions.
Homework assignments and project are available for download; you can
submit your solution online. If necessary,
you can request an extension for your homeworks.
The goal of the special sessions is to provide you with tools and resources that might be useful for the assignments and project. See the special sessions page for more details.
Fall 2020 PennCloud Award
Amit Lohe, Bharath Jaladi, Liana Patel, and Prasanna Poudyal
The Fall 2020 PennCloud Award went to Amit Lohe, Bharath Jaladi, Liana Patel, and Prasanna Poudyal for the overall best final project. The team presented a solidly designed, highly scalable, and robust PennCloud platform that offers strong conconsistency and fault-tolerance via primary-based replication with logging, checkpointing and recovery. The platform provides the complete set of required services with an elegant user interface, including a webmail service that supports both local and remote users, a storage service that supports uploading and downloading of large files in any format, and an admin console that supports viewing and easy controlling of the frontend and backend nodes' status and data. Besides the core functionalities, the platform also features useful extra-credit services, such as a discussion forum and a FIFO-ordered group chat system that are built on top of the KV store and the Paxos consensus protocol.
You can read more about winners and their projects in the CIS505 Hall of Fame.
Example services of the winning project.
||Labor Day - No class
||Processes and threads
The UNIX model
Implementation in the kernel
|Chapter 3.1 (Sections 1+2)
||HW0 due (on 9/10)
[video (part 1)] [video (part 2)]
The file API
||Concurrency control [pdf] [video (part 1)]
[video (part 2)]
Race conditions, critical sections
Deadlock and starvation
||Synchronization [pdf] [video]
Classical synchronization problems
Monitors and condition variables
|Hoare monitors; Mesa monitors
||Communication [pdf] [video]
Handling multiple connections
|Chapters 4.1+4.3 + 3.1 (Section 3)
||Remote Procedure Calls [pdf] [video]
Stub code; marshalling; binding
|Kinds of names; name spaces
The Domain Name System
||Clock synchronization [pdf]
[video (part 1)] [video (part 2)]
Distributed mutual exclusion
|| HW2MS1 due (on 10/8)
||Last day to drop
||First midterm exam
|Oct 14–Oct 17
||Distributed mutual exclusion
Bully algorithm; token ring
||HW2MS2 due (on 10/19)
FIFO, causal and total ordering
||Algorithms for FIFO, causal and total ordering
Sequential and causal consistency
||Bigtable and Project
||Bigtable case study
||2PC and 3PC
Logging and recovery
|Chapters 8.5+8.6; [Chandy-Lamport]
The Consensus problem
|Chapters 8.1+8.2; [Paxos]
||Last day to withdraw
||Non-crash Fault Tolerance
||The Byzantine Generals problem
||Distributed file systems
|Chapter 2.4.2; [Coda]
||Google File System
||Google cluster architecture
Reading and writing in GFS
Consistency and fault tolerance
||Thanksgiving break - no class (Friday schedule)
||MapReduce programming model
||Differences to MapReduce
Case study: PageRank
||DHTs and Dynamo
||Distributed hash tables
The CAP dilemma
||Second midterm exam
|Dec 11–Dec 14
|Dec 15–Dec 22
||Project demos and reports