CIS 5050: Software Systems (Fall 2022)
Overview

Image of a router
This course provides an introduction to fundamental concepts of distributed systems, and the design principles for building large-scale computational systems.

We will study some of the key building blocks – such as synchronization primitives, group communication protocols, and replication techniques – that form the foundation of modern distributed systems, such as cloud-computing platforms or the Internet. We will also look at some real-world examples of distributed systems, such as GFS, MapReduce, Spark, and Dynamo, and we will gain some hands-on experience with building and running distributed systems.

CIS 5050 is one of the core courses in the MSE program, and its final exam qualifies as one of the WPE-I exams in the PhD program.

Logistics

Instructor:
Linh Thi Xuan Phan
Office hours: Tuesdays 12-1pm (Levine 576)

When and where:
Tuesdays/Thursdays 10:15-11:45am, David Rittenhouse Laboratory (DRL) A1

Teaching assistants:

Yifan Cai
Office hours: Mondays 12:00-1:00pm
Location: Levine 612

Tao Luo
Office hours: Mondays 1:00-2:00pm
Location: 5th floor GRW bump space

Anirudh Konduru
Office hours: Tuesdays+Thursdays 12:15-1:15pm
Location: Levine 601

Yuezhan Tao
Office hours: Tuesdays 5:00-6:00pm
Location: OHQ + Zoom

Bill He
Office hours: Wednesdays 11:00am-12:00pm
Location: OHQ + Zoom

Elizabeth Margolin
Office hours: Wednesdays 1:00-2:00pm
Location: 5th floor bump space

Xinyuan Zhao
Office hours: Wednesdays 5:00-6:00pm
Location: Levine 501 bump space

Shivani Reddy Rapole
Office hours: Thursdays 2:00-3:00pm
Location: OHQ + Zoom

Thomas Donnelly
Office hours: Thursdays 4:00-5:00pm
Location: Levine 612

Yilin Guo
Office hours: Fridays 9:00-10:00am
Location: Levine 612

Aditya Bhati
Office hours: Fridays 12:00-1:00pm
Location: OHQ + Zoom

Neeraj Gandhi
Office hours: Fridays 3:00-4:00pm
Location: 5th floor bump space

Vandana Miglani
Office hours: Saturdays 12:00-1:00pm
Location: OHQ + Zoom

 

Mask policy

Masks are required for all class-related activities, including lectures, office hours, exams, special sessions and project demos.

Course policies

Course textbook:
Distributed Systems: Principles and Paradigms, 3rd edition (by M. van Steen and A. Tanenbaum; ISBN 978-1543057386). You can get a digital version of this book for free; hardcopies are available, e.g., from Amazon. Additional material will be drawn from selected research publications.

Prerequisites:
Either undergraduate networking or operating systems is required. You should also be comfortable with programming in C/C++.

Workload:
The course will involve three substantial programming assignments, a group project, and two midterms.

Grading:
Your letter grade will be based on the programming assignments (30%), the group project (35%), the midterm exams (30%), and participation (5%).

Resources

We will be using Ed Discussion for all course-related discussions.

Homework assignments and project are available for download; you can submit your solution online. If necessary, you can request an extension for your homeworks.

Special sessions

The goal of the special sessions is to provide you with tools and resources that might be useful for the assignments and project. See the special sessions page for more details.

PennCloud Award

Winners of the Fall 2020 PennCloud Award
Amit Lohe, Bharath Jaladi, Liana Patel, and Prasanna Poudyal

In certain years, we provide the PennCloud Award to project teams with the most impressive final projects. In Fall 2020, the award went to Amit Lohe, Bharath Jaladi, Liana Patel, and Prasanna Poudyal. The team presented a solidly designed, highly scalable, and robust PennCloud platform that offers strong conconsistency and fault-tolerance via primary-based replication with logging, checkpointing and recovery. The platform provides the complete set of required services with an elegant user interface, including a webmail service that supports both local and remote users, a storage service that supports uploading and downloading of large files in any format, and an admin console that supports viewing and easy controlling of the frontend and backend nodes' status and data. Besides the core functionalities, the platform also features useful extra-credit services, such as a discussion forum and a FIFO-ordered group chat system that are built on top of the KV store and the Paxos consensus protocol.

Fall 2020 PennCloud Project
Example services of the winning project.

You can read more about winners and their projects in the CIS5050 Hall of Fame.
Schedule (Tentative)

Date Topic Details Reading Remarks
Aug 30 Introduction [pdf] [video] Course overview
Policies
Chapter 1  
Sep 1 Processes and threads [pdf] [video] Basic concepts
The UNIX model
Implementation in the kernel
Chapter 3.1 (Sections 1+2) HW0
Sep 6 System calls [pdf] [video #1, video #2] System calls
The file API
Kernel entry/exit
  HW1
Sep 8+13 Concurrency control [pdf] [video #1, video #2] Synchronization primitives
Race conditions, critical sections
Deadlock and starvation
  HW0 due (on 9/8)
Sep 15 Synchronization [pdf] [video] Semaphores
Classical synchronization problems
Monitors and condition variables
Hoare monitors; Mesa monitors  
Sep 20 Communication [pdf] [video] Sockets
Socket programming
Handling multiple connections
Chapters 4.1+4.3 + 3.1 (Section 3) HW1 due (on 9/21)
Sep 22 Remote Procedure Calls [pdf] [video] Programming model
Stub code; marshalling; binding
Handling failures
Chapters 4.2+8.3 HW2
Sep 27+29 Naming [pdf] [video] Kinds of names; name spaces
The Domain Name System;
Akamai; DNSSEC
Chapter 5  
Oct 4 Clock synchronization [pdf] [video] Logical clocks
NTP
Berkeley algorithm
Chapters 6.1–6.3 HW2MS1 due (10/03)
Oct 6–Oct 9 Fall break
Oct 10 Last day to drop  
Oct 11 Clock synchronization (cont) [pdf] [video] Lamport clock
Vector clock
Chapters 6.1–6.3  
Oct 13 Distributed coordination [pdf] [video]
[Midterm review]
Distributed mutual exclusion
Leader election
Bully algorithm; token ring
Chapter 6.4 HW2MS2+3 due (on 10/14)
Oct 18 First midterm exam
Oct 20+25 Group communication [pdf] [video #1] [video #2] Reliable multicast
IP multicast
FIFO, causal and total ordering
Chapter 8.4 HW3
Oct 27 Replication [pdf] [video] Primary/backup protocols
Quorum protocols
Sequential and causal consistency
Client-centric models
Chapter 7 Project
Nov 1 Bigtable and Project [pdf] [video] Bigtable case study
Project overview
[Bigtable]  
Nov 3+8 Fault tolerance [pdf] [video] 2PC and 3PC
Logging and recovery
Chandy-Lamport algorithm
Chapters 8.5+8.6; [  
Nov 7 Last day to withdraw
Nov 10 State-machine replication [pdf] [video] Failure models
The Consensus problem
Paxos
Chapters 8.1+8.2; [Paxos] HW3 due
Nov 15+17 Non-crash Fault Tolerance [pdf] [video #1] [video #2] The Byzantine Generals problem
Impossibility results
Solutions
[BFT]  
Nov 22 Distributed file systems [pdf] [video #1] [video #2] NFS
Coda
Disconnected operation
Chapter 2.4.2; [Coda]  
Nov 24–Nov 27 Thanksgiving break - no class  
Nov 29 Google File System Google cluster architecture
Reading and writing in GFS
Consistency and fault tolerance
[Cluster] [GFS]  
Dec 1 MapReduce MapReduce programming model
System architecture
[MapReduce]  
Spark Differences to MapReduce
RDDs
Case study: PageRank
[Spark]  
Dec 6 DHTs and Dynamo Distributed hash tables
The CAP dilemma
Amazon Dynamo
[Dynamo]  
Dec 8 Second midterm exam
Dec 13–Dec 14 Reading days
Dec 15–Dec 22 Project demos and reports
Web site contact: Linh Thi Xuan Phan