CIS 5050: Software Systems (Spring 2023)
Overview

Image of a router
This course provides an introduction to fundamental concepts of distributed systems, and the design principles for building large-scale computational systems.

We will study some of the key building blocks – such as synchronization primitives, group communication protocols, and replication techniques – that form the foundation of modern distributed systems, such as cloud-computing platforms or the Internet. We will also look at some real-world examples of distributed systems, such as GFS, MapReduce, Spark, and Dynamo, and we will gain some hands-on experience with building and running distributed systems.

CIS 5050 is one of the core courses in the MSE program, and its final exam qualifies as one of the WPE-I exams in the PhD program.

Logistics

Instructor:
Linh Thi Xuan Phan
Office hours: Tuesdays 12-1pm (Levine 576)

When and where:
Tuesdays/Thursdays 10:15-11:45am, LRSM Auditorium

Teaching assistants and office hours:

Jingyi Li Mondays 10:00-11:30am (Levine 601 bump space)
Anirudh Konduru Mondays 2-3pm (Levine 612) + Wednesdays 2-3pm (Levine 601 bump space)
Yumika Amemiya Mondays 4:00-5:00pm (Levine 501 bump space)
David Xu Mondays 3:00-5:00pm (Levine 501 bump space)
Kevin Yang Tuesdays 3:30-4:30pm (Levine 601 bump space)
Lifu Zhang Tuesdays 2:00-3:30pm + Wednesdays 2:00-3:00pm (Levine 501 bump space)
Chanseo Bae Wednesdays 12:30-2:00pm (OHQ)
Ling-Hsin Kung Wednesdays 9:00-11:00am (Levine 6th floor bump space)
Andrew Wang Thursdays 12:30-1:30pm (Levine 601 bump space)
Jeng-Ru Wu Thursdays 4:00-5:30pm (Levine 612)
Aditya Bhati Thursdays 5:00-6:00pm (Levine 501 bump space)
Yujuan Song Fridays 10:00-11:00am (Levine 5th floor bump space)
Maxwell Du Fridays 2:00-3:00pm (Levine 601 bump space)
Kevin Chen Fridays 3:00-4:00pm (Levine 501 bump space)
Yilin Guo Saturdays 10:00am-11:00am (OHQ)
Akriti Gupta Sundays 9:00am-11:00am (OHQ)

Course policies

Course textbook:
Distributed Systems: Principles and Paradigms, 4th edition (by M. van Steen and A. Tanenbaum). You can get a digital version of this book for free; hardcopies will be available, e.g., from Amazon soon. Additional material will be drawn from selected research publications.

Prerequisites:
The course requires undergraduate-level operating systems and networking knowledge, such as CIS 3800 and NETS 212 (or the equivalence). You should also be proficient in C or C++ programming.

Workload:
The course will involve three substantial programming assignments, a group project, and two midterms. Both the programming assignments and the project involve a considerable amount of programming in C/C++, and the project requires the ability to work with your classmates in teams.

Grading:
Your letter grade will be based on the individual programming assignments (35%), the group project (30%), the midterm exams (30%), and participation (5%).

Attendance:
Class attendance is mandatory and will count towards your participation score. The participation score will be computed based on attending lectures, answering questions in class, and answering questions on Ed Discussion.

Masks:
Masks are required for all class-related activities, including lectures, office hours, exams, special sessions and project demos.

Resources

We will be using Ed Discussion for all course-related discussions.

Homework assignments and project are available for download; you can submit your solution online. If necessary, you can request an extension for your homeworks.

Special sessions

The goal of the special sessions is to provide you with tools and resources that might be useful for the assignments and project. See the special sessions page for more details.

PennCloud Award

Winners of the Fall 2022 PennCloud Award
Hanbang Wang, Namita Shukla, Benedict Florance Arockiaraj, and Andrew Zhao

In certain years, we provide the PennCloud Award to project teams with the most impressive final projects. In the last Fall 2022, the award went to Hanbang Wang, Namita Shukla, Benedict Florance Arockiaraj, and Andrew Zhao. The team presented an excellent fault-tolerant PennCloud platform that is well-done in every aspect. The platform supports a complete set of services with an intuitive user interface, including e.g., a webmail service for both local and remote users, a storage service for uploading and downloading of large files in any format, and an admin console for viewing and easy controlling of frontend and backend nodes' status and data. Foundational to these services, the platform also features a solid, scalable system design with strong consistency, efficient fault detection and recovery, fast performance and great usability.

Fall 2022 PennCloud Project
Example services of the winning project.

You can read more about past winners and their projects in the CIS5050 Hall of Fame.

Schedule (Tentative)

Date Topic Details Reading Remarks
Jan 12 Introduction [pdf] [video] Course overview
Policies
Chapter 1 HW0
Jan 17 Processes and threads [pdf] [video] Basic concepts
The UNIX model
Implementation in the kernel
Chapter 3.1 (Sections 1+2)  
Jan 19 System calls [pdf] [video #1] [video #2] System calls
The file API
Kernel entry/exit
  HW0 due; HW1
Jan 24+26 Concurrency control [pdf] [video #1][video #2] Synchronization primitives
Race conditions, critical sections
Deadlock and starvation
   
Jan 31 Synchronization [pdf] [video] Semaphores
Classical synchronization problems
Monitors and condition variables
Hoare monitors; Mesa monitors  
Feb 2 Communication [pdf] [video] Sockets
Socket programming
Handling multiple connections
Chapters 4.1+4.3  
Feb 7+9 Remote Procedure Calls [pdf] [video #1][video #2] Programming model
Stub code; marshalling; binding
Handling failures
Chapters 4.2+8.3 HW1 due (on 2/3);HW2
Feb 14 Naming [pdf] [video #1][video #2] Kinds of names; name spaces
The Domain Name System;
Akamai; DNSSEC
Chapter 6 HW2MS1 due (2/13)
Feb 16 Clock synchronization [pdf] [video #1] Logical clocks
NTP
Berkeley algorithm
Chapters 5.1+5.2  
Feb 20 Last day to drop  
Feb 21 Clock synchronization (cont) [pdf] [video #2] [video #3] Lamport clock
Vector clock
   
Feb 23+28 Group communication [pdf] [video #1] Reliable multicast
IP multicast
FIFO, causal and total ordering
Chapter 8.4 HW2MS2+3 due (on 2/27)
Mar 2 First midterm exam
Mar 4–Mar 12 Spring break
Mar 14 Group communication [pdf] [video #2]     HW3
Mar 16 Replication [pdf] [video] Primary/backup protocols
Quorum protocols
Sequential and causal consistency
Client-centric models
Chapter 7 Project
Mar 21 Bigtable and Project [pdf] [video] Bigtable case study
Project overview
[Bigtable]  
Mar 23 Fault tolerance [pdf] [video #1] [video #2] 2PC and 3PC
Logging and recovery
Chandy-Lamport algorithm
Chapters 8.5+8.6;  
Mar 27 Last day to withdraw
Mar 28 State-machine replication video #1] video #2] Failure models
The Consensus problem
Paxos
Chapters 8.1+8.2; [Paxos] HW3 due (on 3/29)
Mar 30 Non-crash Fault Tolerance video #1] video #2] The Byzantine Generals problem
Impossibility results
Solutions
[BFT]  
Apr 4 Distributed coordination [pdf] [video] Distributed mutual exclusion
Leader election
Bully algorithm; token ring
Chapter 5.3+5.4  
Apr 4 Distributed file systems [pdf] [video] NFS
Coda
Disconnected operation
Chapter 2.3.3; [Coda]  
Apr 6 Google File System [pdf] [video #1] [video #2] Google cluster architecture
Reading and writing in GFS
Consistency and fault tolerance
[Cluster] [GFS]  
Apr 11 MapReduce [pdf] [video #1] [video #2] MapReduce programming model
System architecture
[MapReduce]  
Apr 13 Spark [pdf] [video #1] [video #2] Differences to MapReduce
RDDs
Case study: PageRank
[RDD] [Spark]  
Apr 18+20 DHTs and Dynamo [pdf] [video] Distributed hash tables
The CAP dilemma
Amazon Dynamo
[Dynamo]  
Apr 25 Second midterm exam
Apr 27–Apr 30 Reading days
May 1–May 9 Project demos and reports
Web site contact: Linh Thi Xuan Phan