l CIS 505: Software Systems (Fall 2020)
CIS 505: Software Systems (Fall 2020)
Overview

Image of a router
This course provides an introduction to fundamental concepts of distributed systems, and the design principles for building large-scale computational systems.

We will study some of the key building blocks – such as synchronization primitives, group communication protocols, and replication techniques – that form the foundation of modern distributed systems, such as cloud-computing platforms or the Internet. We will also look at some real-world examples of distributed systems, such as GFS, MapReduce, Spark, and Dynamo, and we will gain some hands-on experience with building and running distributed systems.

CIS 505 is one of the core courses in the MSE program, and its final exam qualifies as one of the WPE-I exams in the PhD program.

Logistics

Instructor:
Linh Thi Xuan Phan
Office hours: Tuesdays 10:30-11:30am EDT (via OHQ and Zoom)

Lectures:
Mondays/Wednesdays 1:30-3:00pm
Zoom link: Please see Piazza.
(If you are on the waitlist, please check for an email from the instructor on 09/01/2020.)

Teaching assistants:
Zhe Cai
Office hours: Tuesdays noon-1pm EDT and Thursdays noon-1pm EDT
David Carroll
Office hours: Thursdays 8-9pm EDT and Fridays 11am-noon EDT
Neeraj Gandhi
Office hours: Fridays 8-9pm EDT
Robert Gifford Office hours: Wednesdays 11am-noon EDT and Fridays 3-4pm EDT
John Hay
Office hours: Tuesdays 7-8:30pm EDT
Anavi Kaushik
Office hours: Mondays 10-11am EDT
Mauricio Sifontes
Office hours: Mondays 4-5pm EDT and Wednesdays 4-5pm EDT
Harshmeet Singh
Office hours: Thursdays 9:30-11am EDT
Yuxuan Zhang
Office hours: Saturdays 1-2:30pm EDT

Office hour timetable for Fall 2020.
Time table schedule for Fall 2020
Office hours are held via OHQ and Zoom.

Course policies

Course textbook:
Distributed Systems: Principles and Paradigms, 3rd edition (by M. van Steen and A. Tanenbaum; ISBN 978-1543057386). You can get a digital version of this book for free; hardcopies are available, e.g., from Amazon. Additional material will be drawn from selected research publications.

Prerequisites:
Either undergraduate networking or operating systems is required. You should also be comfortable with programming in C/C++.

Workload:
The course will involve three substantial programming assignments, a group project, and two midterms.

Grading:
Your letter grade will be based on the programming assignments (35%), the group project (35%), the midterm exams (20%), and participation and quizzes (10%).

Resources

We will be using Piazza for all course-related discussions.

Homework assignments and project are available for download; you can submit your solution online. If necessary, you can request an extension for your homeworks.

Special sessions

The goal of the special sessions is to provide you with tools and resources that might be useful for the assignments and project. See the special sessions page for more details.

PennCloud Award

Winners of the Fall 2020 PennCloud Award
Amit Lohe, Bharath Jaladi, Liana Patel, and Prasanna Poudyal

The Fall 2020 PennCloud Award went to Amit Lohe, Bharath Jaladi, Liana Patel, and Prasanna Poudyal for the overall best final project. The team presented a solidly designed, highly scalable, and robust PennCloud platform that offers strong conconsistency and fault-tolerance via primary-based replication with logging, checkpointing and recovery. The platform provides the complete set of required services with an elegant user interface, including a webmail service that supports both local and remote users, a storage service that supports uploading and downloading of large files in any format, and an admin console that supports viewing and easy controlling of the frontend and backend nodes' status and data. Besides the core functionalities, the platform also features useful extra-credit services, such as a discussion forum and a FIFO-ordered group chat system that are built on top of the KV store and the Paxos consensus protocol.

Fall  2020 PennCloud Project
Example services of the winning project.

You can read more about winners and their projects in the CIS505 Hall of Fame.

Schedule (Tentative)

Date Topic Details Reading Remarks
Sep 2 Introduction Course overview
Policies
Chapter 1 HW0
Sep 7 Labor Day - No class  
Sep 9 Processes and threads
Basic concepts
The UNIX model
Implementation in the kernel
Chapter 3.1 (Sections 1+2) HW0 due
Sep 14 HW1
Sep 16 System calls System calls
The file API
Kernel entry/exit
   
Sep 21 Concurrency control Synchronization primitives
Race conditions, critical sections
Deadlock and starvation
   
Sep 28 Synchronization
Semaphores
Classical synchronization problems
Monitors and condition variables
Hoare monitors; Mesa monitors HW1 due
Sep 30 Communication Sockets
Socket programming
Handling multiple connections
Chapters 4.1+4.3 + 3.1 (Section 3) HW2
Oct 5+7 Remote Procedure Calls Programming model
Stub code; marshalling; binding
Handling failures
Chapters 4.2+8.3  
Oct 12 Naming Kinds of names; name spaces
The Domain Name System
LDAP
Chapter 5 HW2MS1 due (on 10/12)
Oct 12 Last day to drop  
Oct 14 First midterm exam
Oct 19 Clock synchronization Logical clocks
Distributed mutual exclusion
NTP
Chapters 6.1–6.3  
Oct 21 Distributed coordination Distributed mutual exclusion
Leader election
Bully algorithm; token ring
Chapter 6.4 HW2MS2+3 due (on 10/23)
Oct 26 Group communication Reliable multicast
IP multicast
FIFO, causal and total ordering
Chapter 8.4  
Oct 28 Algorithms for FIFO, causal and total ordering HW3
Nov 2 Replication Primary/backup protocols
Quorum protocols
Sequential and causal consistency
Client-centric models
Chapter 7 Project
Nov 4 Bigtable and Project Bigtable case study
Project overview
[Bigtable]  
Nov 9 Fault tolerance 2PC and 3PC
Logging and recovery
Chandy-Lamport algorithm
Chapters 8.5+8.6; [Chandy-Lamport]  
Nov 9 Last day to withdraw
Nov 11 State-machine replication Failure models
The Consensus problem
Paxos
Chapters 8.1+8.2; [Paxos] HW3 due
(on 11/13)
Nov 16 Non-crash Fault Tolerance The Byzantine Generals problem
Impossibility results
Solutions
[BFT]  
Nov 18+23 Distributed file systems NFS
Coda
Disconnected operation
Chapter 2.4.2; [Coda]  
Nov 25 Thanksgiving break - no class (Friday schedule)  
Nov 30 Google File System Google cluster architecture
Reading and writing in GFS
Consistency and fault tolerance
[Cluster] [GFS]  
Dec 2 MapReduce MapReduce programming model
System architecture
[MapReduce]  
Spark Differences to MapReduce
RDDs
Case study: PageRank
[Spark]  
Dec 7 DHTs and Dynamo Distributed hash tables
The CAP dilemma
Amazon Dynamo
[Dynamo]  
Dec 9 Second midterm exam
Dec 11–Dec 14 Reading days
Dec 15–Dec 22 Project demos (via Zoom) and reports

Web site contact: Linh Thi Xuan Phan