CIS 505: Software Systems (Fall 2020)
This course provides an introduction to fundamental concepts of distributed systems,
and the design principles for building large-scale computational systems.
We will study some of the key building blocks – such as synchronization primitives, group communication
protocols, and replication techniques – that form the foundation of modern distributed systems, such as
cloud-computing platforms or the Internet. We will also look at some real-world examples of distributed
systems, such as GFS, MapReduce, Spark, and Dynamo, and we will gain some hands-on experience
with building and running distributed systems.
CIS 505 is one of the core courses
in the MSE program, and its final exam qualifies as one of the WPE-I exams in the PhD program.
Linh Thi Xuan Phan
Office hours: Tuesdays 10:30-11:30am EDT (via OHQ and Zoom)
Zoom link: Please see Piazza.
(If you are on the waitlist, please check for an email from the instructor on 09/01/2020.)
Office hours are held via OHQ and Zoom.
Office hour timetable for Fall 2020.
Distributed Systems: Principles and Paradigms, 3rd edition (by M. van Steen and A. Tanenbaum; ISBN 978-1543057386).
You can get a digital version of this book for free; hardcopies are available, e.g., from Amazon.
Additional material will be drawn from selected research publications.
Either undergraduate networking or operating systems is required. You should also be comfortable with programming in C/C++.
The course will involve three substantial programming assignments, a group project,
and two midterms.
Your letter grade will be based on the programming assignments (35%), the group project (35%), the
midterm exams (20%), and participation and quizzes (10%).
We will be using Piazza for all course-related discussions.
Homework assignments and project are available for download; you can
submit your solution online. If necessary,
you can request an extension for your homeworks.
The goal of the special sessions is to provide you with tools and resources that might be useful for the assignments and project. See the special sessions page for more details.
Amit Lohe, Bharath Jaladi, Liana Patel, and Prasanna Poudyal
The Fall 2020 PennCloud Award went to Amit Lohe, Bharath Jaladi, Liana Patel, and Prasanna Poudyal for the overall best final project. The team presented a solidly designed, highly scalable, and robust PennCloud platform that offers strong conconsistency and fault-tolerance via primary-based replication with logging, checkpointing and recovery. The platform provides the complete set of required services with an elegant user interface, including a webmail service that supports both local and remote users, a storage service that supports uploading and downloading of large files in any format, and an admin console that supports viewing and easy controlling of the frontend and backend nodes' status and data. Besides the core functionalities, the platform also features useful extra-credit services, such as a discussion forum and a FIFO-ordered group chat system that are built on top of the KV store and the Paxos consensus protocol.
You can read more about winners and their projects in the CIS505 Hall of Fame.
Example services of the winning project.
||Labor Day - No class
||Processes and threads
The UNIX model
Implementation in the kernel
|Chapter 3.1 (Sections 1+2)
The file API
Race conditions, critical sections
Deadlock and starvation
Classical synchronization problems
Monitors and condition variables
|Hoare monitors; Mesa monitors
Handling multiple connections
|Chapters 4.1+4.3 + 3.1 (Section 3)
||Remote Procedure Calls
Stub code; marshalling; binding
||Kinds of names; name spaces
The Domain Name System
||HW2MS1 due (on 10/12)
||Last day to drop
||First midterm exam
Distributed mutual exclusion
||Distributed mutual exclusion
Bully algorithm; token ring
||HW2MS2+3 due (on 10/23)
FIFO, causal and total ordering
||Algorithms for FIFO, causal and total ordering
Sequential and causal consistency
||Bigtable and Project
||Bigtable case study
||2PC and 3PC
Logging and recovery
|Chapters 8.5+8.6; [Chandy-Lamport]
||Last day to withdraw
The Consensus problem
|Chapters 8.1+8.2; [Paxos]
||Non-crash Fault Tolerance
||The Byzantine Generals problem
||Distributed file systems
|Chapter 2.4.2; [Coda]
||Thanksgiving break - no class (Friday schedule)
||Google File System
||Google cluster architecture
Reading and writing in GFS
Consistency and fault tolerance
||MapReduce programming model
||Differences to MapReduce
Case study: PageRank
||DHTs and Dynamo
||Distributed hash tables
The CAP dilemma
||Second midterm exam
|Dec 11–Dec 14
|Dec 15–Dec 22
||Project demos (via Zoom) and reports