CIS 505: Software Systems (Fall 2018)
Overview

Image of a router
This course provides an introduction to fundamental concepts of distributed systems, and the design principles for building large-scale computational systems.

We will study some of the key building blocks – such as synchronization primitives, group communication protocols, and replication techniques – that form the foundation of modern distributed systems, such as cloud-computing platforms or the Internet. We will also look at some real-world examples of distributed systems, such as GFS, MapReduce, Spark, and Dynamo, and we will gain some hands-on experience with building and running distributed systems.

CIS 505 is one of the core courses in the MSE and EMBS programs, and its final exam qualifies as one of the four WPE-I exams in the PhD program.

Logistics

Instructor:
Linh Thi Xuan Phan
Office hours: Wednesdays 1:00-2:00pm (Levine 576)

When and where:
MW 4:30-6:00pm, Moore 216

Teaching assistants:

Arvind Mandyam Annaswamy
Office hours: Mondays 10:00-11:00am
Location: Levine 6th floor bump space

Darshit Doshi
Office hours: Mondays 2:30-3:30pm
Location: GRW 5th floor bump space

Ben Judd
Office hours: Mondays 3:30-4:30pm
Location: Levine 6th floor bump space

Lawrence Choi
Office hours: Tuesdays 12:00-1:00pm
Location: Levine 5th floor bump space

Swachhand Lokhande
Office hours: Tuesdays 3:00-4:00pm
Location: GRW 5th floor bump space

Neeraj Gandhi
Office hours: Wednesdays 9:00-10:00am
Location: Levine 6th floor bump space

Thomas Amon
Office hours: Wednesdays 2:00-3:00pm
Location: GRW 5th floor bump space

Shashank Garg
Office hours: Thursdays 12:30-1:30pm
Location: Levine 6th floor bump space

Yecheng Yang
Office hours: Thursdays 2:00-3:00pm
Location: Levine 6th floor bump space

Cyril Saade
Office hours: Thursdays 3:00-4:00pm
Location: Levine 6th floor bump space

Xinyu Li
Office hours: Fridays 10:00-11:00am
Location: Levine 6th floor bump space

Garvit Gupta
Office hours: Fridays 12:00-1:00pm
Location: Levine 5th floor bump space


Course policies

Course textbook:
Distributed Systems: Principles and Paradigms, 3rd edition (by M. van Steen and A. Tanenbaum; ISBN 978-1543057386). You can get a digital version of this book for free; hardcopies are available, e.g., from Amazon. Additional material will be drawn from selected research publications.

Prerequisites:
Either undergraduate networking or operating systems is required. You should also be comfortable with programming in C/C++.

Workload:
The course will involve three substantial programming assignments, a group project, a midterm, and a final examination.

Grading:
Your letter grade will be based on the programming assignments (30%), the group project (25%), the midterm exam (15%), the final exam (25%), and your participation (5%).

Resources

We will be using Piazza for all course-related discussions.

Homework assignments and project are available for download; you can submit your solution online. If necessary, you can request an extension for your homeworks.

PennCloud Award

Winners of Spring 2018 PennCloud Award
Garvit Gupta, Shiva Suri, Anant Maheshwari
and Sahana Vijaya Prasad
Spring 2018 PennCloud Project
Example services of the winning project.

The Spring 2018 PennCloud Award went to Garvit Gupta, Anant Maheshwari, Sahana Vijaya Prasad, and Shiva Suri for the overall best final project. The team presented a solid design of a fault-tolerant cloud platform with strong consistency, qourum-based replication, and efficient checkpointing and recovery. The platform provides a diverse set of services with multiple useful features, such as a webmail service that supports multiple users, mail folders, sorting and labeling; a storage service that supports uploading and downloading of files in any format; and a user-friendly admin console. Besides these core functionalities, the platform also offers users a novel and beautifully-designed tic-tac-toe game that is built on top of group communication.

You can read more about winners and their projects in the CIS505 Hall of Fame.

Schedule (Tentative)

Date Topic Details Reading Remarks
Aug 29 Introduction [.pdf] Course overview
Policies
Chapter 1  
Sep 3 Labor Day - No class HW0
Sep 5 Processes and threads [.pdf] Basic concepts
The UNIX model
Implementation in the kernel
Chapter 3.1 (Sections 1+2) HW0 due (on 9/7); HW1
Sep 10 System calls [.pdf] System calls
The file API
Kernel entry/exit
   
Sep 12 Concurrency control [.pdf] Synchronization primitives
Race conditions, critical sections
Deadlock and starvation
   
Sep 17 Synchronization [.pdf] Semaphores
Classical synchronization problems
Monitors and condition variables
   
Sep 19 Communication [.pdf] Sockets
Socket programming
Handling multiple connections
Chapters 4.1+4.3 + 3.1 (Section 3) HW1 due (on 09/21)
Sep 24 Remote Procedure Calls [.pdf] Programming model
Stub code; marshalling; binding
Handling failures
Chapters 4.2+8.3 HW2
Sep 26  
Oct 1 Naming [.pdf] Kinds of names; name spaces
The Domain Name System
LDAP
Chapter 5 HW2MS1 due
Oct 3 Clock synchronization [.pdf] Logical clocks
Distributed mutual exclusion
NTP
Chapters 6.1–6.3  
Oct 4–7 Fall break  
Oct 8 Last day to drop  
Oct 8 Distributed coordination [.pdf] Distributed mutual exclusion
Leader election
Bully algorithm; token ring
Chapter 6.4  
Oct 10 Group communication [.pdf] Reliable multicast
IP multicast
FIFO, causal and total ordering
Chapter 8.4  
Oct 15 Algorithms for FIFO, causal and total ordering Chapter 8.4 HW2MS2+3 due
Oct 17 Midterm The midterm will cover topics from the first lecture HW3; Project
Oct 22 Replication [.pdf] Primary/backup protocols
Quorum protocols
Sequential and causal consistency
Client-centric models
Chapter 7  
Oct 24 Bigtable and Project [.pdf] Bigtable case study
Project overview
[Bigtable]  
Oct 29 Fault tolerance [.pdf] 2PC and 3PC
Logging and recovery
Chandy-Lamport algorithm
Chapters 8.5+8.6; [Chandy-Lamport]  
Oct 31 State-machine replication [.pdf] Failure models
The Consensus problem
Paxos
Chapters 8.1+8.2; [Paxos] HW3 due (on 11/02)
Nov 5 Non-crash Fault Tolerance [.pdf] The Byzantine Generals problem
Impossibility results
Solutions
[BFT] Project proposal due (on 11/8)
Nov 7
Nov 9 Last day to withdraw
Nov 12 File systems [.pdf] File operations; name space
Data structures on disk
Space management
   
Nov 14 Distributed file systems [.pdf] NFS
Coda
Disconnected operation
Chapter 2.4.2; [Coda]  
Nov 19 Google File System [.pdf] Google cluster architecture
Reading and writing in GFS
Consistency and fault tolerance
[Cluster] [GFS]  
Nov 21 Thanksgiving break - no class (Friday schedule)  
Nov 26 MapReduce [.pdf] MapReduce programming model
System architecture
[MapReduce]  
Nov 28 Class is canceled - Linh is away  
Dec 3 Spark [.pdf] Differences to MapReduce
RDDs
Case study: PageRank
[Spark]  
Dec 5 DHTs and Dynamo
Exam review
[.pdf]
Distributed hash tables
The CAP dilemma
Amazon Dynamo
[Dynamo]  
Dec 10
Dec 11–12 Reading days
Dec 13–Dec 19 Project demos and reports
Dec 20 Final Exam (3-5pm) The final exam will cover all topics studied in the entire semester
Web site contact: Linh Thi Xuan Phan