CIS 505: Software Systems (Fall 2017)

Image of a router
This course provides an introduction to fundamental concepts of distributed systems, and the design principles for building large-scale computational systems.

We will study some of the key building blocks – such as synchronization primitives, group communication protocols, and replication techniques – that form the foundation of modern distributed systems, such as cloud-computing platforms or the Internet. We will also look at some real-world examples of distributed systems, such as MapReduce, Spark, Dynamo, and ZooKeeper, and we will gain some hands-on experience with building and running distributed systems.

CIS 505 is one of the core courses in the MSE and EMBS programs, and its final exam qualifies as one of the four WPE-I exams in the PhD program.


Linh Thi Xuan Phan
Office hours: Wednesdays 1:00-2:00pm (Levine 464)

When and where:
MW 4:30-6:00pm, David Rittenhouse Laboratory (DRL) A2

Teaching assistants:

Alexander Thurston
Office hours: Mondays, 1:30-2:30pm
Location: Levine 5th floor bump space

Xiaozhou Pu
Office hours: Mondays, 3:00-4:00pm
Location: Levine 5th floor bump space

Grayson Honan
Office hours: Tuesdays, 3:00-4:00pm
Location: Towne 319

Rohit Kothur
Office hours: Tuesdays, 5:00-6:00pm
Location: Levine 512

Oshin Agarwal
Office hours: Thursdays, 12:00-1:00pm
Location: Towne 307

Han Zhu
Office hours: Thursdays, 1:00-2:00pm
Location: Levine 5th floor bump space

Saeed Abedigozalabad
Office hours: Fridays, 4:00-5:00pm
Location: Levine 5th floor bump space

Devanshu Jain
Office hours: Fridays, 5:00-6:30pm
Location: Levine 5th floor bump space

Course policies

Course textbook:
Distributed Systems: Principles and Paradigms, 3rd edition (by M. van Steen and A. Tanenbaum; ISBN 978-1543057386). You can get a digital version of this book for free; hardcopies are available, e.g., from Amazon. Additional material will be drawn from selected research publications.

Either undergraduate networking or operating systems is required. You should also be comfortable with programming in C/C++.

The course will involve three substantial programming assignments, a group project, a midterm, and a final examination.

Your letter grade will be based on the programming assignments (30%), the group project (25%), the midterm exam (15%), the final exam (25%), and your participation (5%).


We will be using Piazza for course-related discussions.

Homework assignments and project are available for download; you can submit your solution online. If necessary, you can request an extension for your homeworks.

Schedule (tentative)

Date Topic Details Reading Remarks
Aug 30 Introduction Course overview
Chapter 1 HW0
Sep 4 No class (Labor Day)
Sep 6 Processes and threads Basic concepts
The UNIX model
Implementation in the kernel
Chapter 3.1 (Sections 1+2) HW0 due (on 09/07); HW1
Sep 11 System calls System calls
The file API
Kernel entry/exit
Sep 13 Concurrency control Synchronization primitives
Race conditions, critical sections
Deadlock and starvation
Sep 18 Synchronization Semaphores
Classical synchronization problems
Monitors and condition variables
Sep 20 Communication Sockets
Socket programming
Handling multiple connections
Chapters 4.1+4.3 + 3.1 (Section 3) HW1 due (on 9/22)
Sep 25 Remote Procedure Calls Programming model
Stub code; marshalling; binding
Handling failures
Chapters 4.2+8.3 HW2
Sep 27 Naming Kinds of names; name spaces
The Domain Name System
Chapter 5  
Oct 2 Clock synchronization Logical clocks
Distributed mutual exclusion
Chapters 6.1–6.3 HW2MS1 due (on 10/3)
Oct 4 Distributed coordination Distributed mutual exclusion
Leader election
Bully algorithm; token ring
Chapter 6.4  
Oct 5–8 Fall break
Oct 9 Last day to drop
Oct 9 Group communication Reliable multicast
IP multicast
FIFO, causal and total ordering
Chapter 8.4 HW3
Oct 11
Oct 16 Replication Primary/backup protocols
Quorum protocols
Sequential and causal consistency
Client-centric models
Chapter 7 HW2MS2+3 due
Oct 18 Midterm The midterm exam will cover topics through Oct 16 HW3
Oct 23 Bigtable and Project Bigtable case study
Project overview
[Bigtable] Project
Oct 25 Fault tolerance 2PC and 3PC
Logging and recovery
Chandy-Lamport algorithm
Chapters 8.5+8.6  
Oct 30 State-machine replication Failure models
The Consensus problem
Chapters 8.1+8.2; [Paxos] HW3 due (on 11/3)
Nov 1
Nov 6 Non-crash Fault Tolerance The Byzantine Generals problem
Impossibility results
Nov 8 File systems File operations; name space
Data structures on disk
Space management
Nov 10 Last day to withdraw
Nov 13 Distributed file systems NFS
Disconnected operation
Chapter 2.4.2; [Coda]  
Nov 15 Google File System Google cluster architecture
Reading and writing in GFS
Consistency and fault tolerance
[Cluster] [GFS]  
Nov 20 MapReduce MapReduce programming model
System architecture
Nov 22 No class (Friday schedule)
Nov 23–26 Thanksgiving Break
Nov 27 Spark Differences to MapReduce
Case study: PageRank
Nov 29 DHTs and Dynamo Distributed hash tables
The CAP dilemma
Amazon Dynamo
Dec 4+6 Class is canceled – Linh is at RTSS
Dec 11 Real-Time Cloud and Exam review Real-time scheduling
Real-time virtualization
Cloud resource allocation
Dec 12–13 Reading days
Dec 15 Final Exam (3-5pm) The final exam will cover all topics, from Aug 30 through Dec 11
Dec 14–21 Project demos and reports
Web site contact: Linh Thi Xuan Phan