l CIS 505: Software Systems (Fall 2021)
CIS 505: Software Systems (Fall 2021)
Overview

Image of a router
This course provides an introduction to fundamental concepts of distributed systems, and the design principles for building large-scale computational systems.

We will study some of the key building blocks – such as synchronization primitives, group communication protocols, and replication techniques – that form the foundation of modern distributed systems, such as cloud-computing platforms or the Internet. We will also look at some real-world examples of distributed systems, such as GFS, MapReduce, Spark, and Dynamo, and we will gain some hands-on experience with building and running distributed systems.

CIS 505 is one of the core courses in the MSE program, and its final exam qualifies as one of the WPE-I exams in the PhD program.

Logistics

Instructor:
Linh Thi Xuan Phan
Office hours: Wednesdays 1:00-2:00pm EDT
Location: OHQ and Zoom.

Lectures:
Time: Mondays/Wednesdays 10:15-11:45am
Location: Please see Piazza
(If you are on the waitlist and would like to attend the first few lectures, please email me.)

Teaching assistants:
Yuxuan Zhang
Office hours: Mondays 1:30-3:00pm EDT
Karan Newatia
Office hours: Mondays 4:00-5:00pm EDT
Robert Gifford Office hours: Tuesdays 11:00-noon EDT
John Hay
Office hours: Tuesdays 2:00-3:00pm EDT
 
                     Wednesdays 4:00-5:00pm EDT
Yifan Cai
Office hours: Tuesdays 5:00-6:00pm EDT
Bill He
Office hours: Thursdays 11:00-1:00pm EDT
Zhilei Zheng
Office hours: Thursdays 7:00-9:00pm EDT
Neeraj Gandhi
Office hours: Fridays 10:00-11:00am EDT
Ziad Ben Hadj-Alouane
Office hours: Saturdays noon-1:30pm EDT
Sidharth Sankhe
Office hours: Sundays 7:00-8:00pm EDT

Office hour timetable for Fall 2021.
Time table schedule for Fall 2021

Office hours are held via OHQ and Zoom.

Course policies

Course textbook:
Distributed Systems: Principles and Paradigms, 3rd edition (by M. van Steen and A. Tanenbaum; ISBN 978-1543057386). You can get a digital version of this book for free; hardcopies are available, e.g., from Amazon. Additional material will be drawn from selected research publications.

Prerequisites:
Either undergraduate networking or operating systems is required. You should also be comfortable with programming in C/C++.

Workload:
The course will involve three substantial programming assignments, a group project, and two midterms.

Grading:
Your letter grade will be based on the programming assignments (35%), the group project (35%), the midterm exams (25%), and participation and quizzes (5%).

Resources

We will be using Piazza for all course-related discussions.

Homework assignments and project are available for download; you can submit your solution online. If necessary, you can request an extension for your homeworks.

Special sessions

The goal of the special sessions is to provide you with tools and resources that might be useful for the assignments and project. See the special sessions page for more details.

Fall 2020 PennCloud Award

Winners of the Fall 2020 PennCloud Award
Amit Lohe, Bharath Jaladi, Liana Patel, and Prasanna Poudyal

The Fall 2020 PennCloud Award went to Amit Lohe, Bharath Jaladi, Liana Patel, and Prasanna Poudyal for the overall best final project. The team presented a solidly designed, highly scalable, and robust PennCloud platform that offers strong conconsistency and fault-tolerance via primary-based replication with logging, checkpointing and recovery. The platform provides the complete set of required services with an elegant user interface, including a webmail service that supports both local and remote users, a storage service that supports uploading and downloading of large files in any format, and an admin console that supports viewing and easy controlling of the frontend and backend nodes' status and data. Besides the core functionalities, the platform also features useful extra-credit services, such as a discussion forum and a FIFO-ordered group chat system that are built on top of the KV store and the Paxos consensus protocol.

Fall 2020 PennCloud Project
Example services of the winning project.

You can read more about winners and their projects in the CIS505 Hall of Fame.

Schedule (Tentative)

Date Topic Details Reading Remarks
Sep 1 Introduction [pdf] [video] Course overview
Policies
Chapter 1 HW0
Sep 6 Labor Day - No class  
Sep 8 Processes and threads [pdf] [video]
Basic concepts
The UNIX model
Implementation in the kernel
Chapter 3.1 (Sections 1+2) HW0 due (on 9/10)
Sep 13 System calls [pdf] [video (part 1)] [video (part 2)] System calls
The file API
Kernel entry/exit
  HW1
Sep 15+20 Concurrency control [pdf] [video (part 1)] [video (part 2)] Synchronization primitives
Race conditions, critical sections
Deadlock and starvation
   
Sep 22 Synchronization [pdf] [video]
Semaphores
Classical synchronization problems
Monitors and condition variables
Hoare monitors; Mesa monitors  
Sep 27 Communication [pdf] [video]
Sockets
Socket programming
Handling multiple connections
Chapters 4.1+4.3 + 3.1 (Section 3) HW1 due
Sep 29 Remote Procedure Calls [pdf] [video]
Programming model
Stub code; marshalling; binding
Handling failures
Chapters 4.2+8.3 HW2
Oct 4 Naming [pdf] [video]
Kinds of names; name spaces
The Domain Name System
LDAP
Chapter 5  
Oct 6+11 Clock synchronization [pdf] [video (part 1)] [video (part 2)]
Logical clocks
Distributed mutual exclusion
NTP
Chapters 6.1–6.3 HW2MS1 due (on 10/8)
Oct 11 Last day to drop  
Oct 13 First midterm exam
Oct 14–Oct 17 Fall break
Oct 18 Distributed coordination Distributed mutual exclusion
Leader election
Bully algorithm; token ring
Chapter 6.4 HW2MS2 due (on 10/19)
Oct 20 Group communication Reliable multicast
IP multicast
FIFO, causal and total ordering
Chapter 8.4  
Oct 25 Algorithms for FIFO, causal and total ordering HW3
Oct 27 Replication Primary/backup protocols
Quorum protocols
Sequential and causal consistency
Client-centric models
Chapter 7 Project
Nov 1 Bigtable and Project Bigtable case study
Project overview
[Bigtable]  
Nov 3 Fault tolerance 2PC and 3PC
Logging and recovery
Chandy-Lamport algorithm
Chapters 8.5+8.6; [Chandy-Lamport] HW3 due
(on 11/5)
Nov 8 State-machine replication Failure models
The Consensus problem
Paxos
Chapters 8.1+8.2; [Paxos]  
Nov 8 Last day to withdraw
Nov 10 Non-crash Fault Tolerance The Byzantine Generals problem
Impossibility results
Solutions
[BFT]  
Nov 15+17 Distributed file systems NFS
Coda
Disconnected operation
Chapter 2.4.2; [Coda]  
Nov 22 Google File System Google cluster architecture
Reading and writing in GFS
Consistency and fault tolerance
[Cluster] [GFS]  
Nov 24 Thanksgiving break - no class (Friday schedule)  
Nov 29 MapReduce MapReduce programming model
System architecture
[MapReduce]  
Dec 1 Spark Differences to MapReduce
RDDs
Case study: PageRank
[Spark]  
Dec 6 DHTs and Dynamo Distributed hash tables
The CAP dilemma
Amazon Dynamo
[Dynamo]  
Dec 8 Second midterm exam
Dec 11–Dec 14 Reading days
Dec 15–Dec 22 Project demos and reports

Web site contact: Linh Thi Xuan Phan