CIS 455 / 555: Internet and Web Systems (Spring 2021)

Instructor

Zack Ives
Office hours: Mondays 3:00pm-4:00pm, Wednesdays 9:00am-10:00am or by appointment
Location: 305 Levine Hall in Penn CIS' Gather.town (link posted in Piazza)

Time and location

This class will be asynchronous, with prerecorded video modules. Occasional synchronous meetings may be held on Mondays 1:30pm – 3:00pm Eastern time.
We would like to keep everyone engaged, so you are expected to complete a brief review quiz after viewing each video; this will count towards your participation (as will contributions to Piazza). We'll try to ensure that videos are posted in advance of Mondays/Wednesdays. Synchronous meetings will be recorded for those who can't make it.

Teaching assistants

TA office hours will be held in the TA Office Hour Gather.town (link posted in Piazza) and to keep your place in order, you should sign up in OHQ.

  • Xinyi Chen, cxinyic@seas
  • Ameya Gadkari, ameyaga@seas
  • Phillip Hilliard, pdh@seas
  • Adam Khakhar, akhakhar@wharton
  • Matt Lebermann, mleb@seas
  • Karthik Macherla, kmacher@seas
  • Vikas Shankarathota, vikasmsh@seas
  • Wesley Yee, wesyee@seas
  • Karen Zheng, karenzxy@seas
  • Ke Zhong, kezhong@seas
And Michael Abelar will be overseeing the autograding infrastructure.

Course description

This course focuses on the issues encountered in building Internet and web systems: scalability, interoperability (of data and code), atomicity and consistency models, replication, and location of resources, services, and data. Note that "Internet" in this context refers to the ecosystem of applications that you can use to access and share information across the world. For details on the system that underpins these applications, see CIS 553).

On a similar note, this course is not about building database-backed or PHP/JSP/Node-based web sites (for this, see CIS 450/550 or NETS 212) or about the use of "big data analytics" platforms like MapReduce, Apache Storm, Spark, etc (for that, see CIS 545). While this course touches on the above topics, here we will learn how these systems are actually built!

We will examine how XML standards enable information exchange; how web services support cross-platform interoperability (and what their limitations are); how "cloud computing" services work; how to do replication and Akamai-like content distribution; and how application servers provide transaction support in distributed environments. We will study techniques for locating machines, resources, and data (including directory systems, information retrieval indexing and ranking, web search, and publish/subscribe systems); we will discuss collaborative filtering and mining the Web for patterns; we will investigate how different architectures support scalability and distributed coordination (and the issues they face). We will also examine the ideas that have been proposed for tomorrow's Web, and see some of the challenges, research directions, and potential pitfalls.

An important goal of the course is not simply to discuss issues and solutions, but to provide hands-on experience with a substantial implementation project. This final project will be an implementation of a Google-style search engine, including distributed, scalable crawling; indexing with ranking; stream processing; and even PageRank on your own MapReduce-style platform!

As a side effect of the material of this course, you will learn about some aspects of large-scale software development: assimilating large APIs, thinking about modularity, reading other people's code, managing versions, debugging, and so on.

CIS555 is now a core course for the MSE degree as well as an option for the WPE I requirement for PhDs. The Daily Pennsylvanian published a nice article about CIS455/555.

Format

The format will be two lecture modules per week, with occasional synchronous meetings; plus assigned readings from handouts. There will be regular homework assignments and a substantial implementation project with experimental validation and a report. There will also be two take-home midterms.

Links

  • Piazza for course-related discussions; please sign up here.
  • Canvas for access to videos, to submit some homeworks, and to take review quizzes.
  • Gradescope for some homework submission, grading, and exams. You will automatically be added once you are in Canvas.
  • Gather.town (link on Piazza) and OHQ for getting help during TA office hours.
  • Assignments list for assignment handouts.

Prerequisites

This course expects familiarity with threads and concurrency, as well as strong Java programming skills. Those highly proficient in another programming language, such as C++ or C#, should be able to translate their skills easily. The course will require a considerable amount of programming, as well as the ability to work with your classmates in teams.

Texts and readings

Distributed Systems: Principles and Paradigms, 3rd edition, by Tanenbaum and van Steen, Prentice Hall (ISBN 978-1530281756).
You can buy a physical copy (e.g., for $35 on Amazon) or download a free digital copy here.

Additional materials will be provided as handouts or in the form of light technical papers. A reading list is available.

Assignments

The homework assignments will be available here. You can submit directly in Canvas.

Grading

Homework 38%, first midterm 12%, second midterm 12%, project 33%, participation 5%.

Final project

Wondering what you will be able to do at the end of this class? Here is an example from Spring 2017:
Hung, Hitali, Chirag, and Harsh

Example results from PennCH3
Searching with Alexa
The 2017 Google Award for the best final project went to Hung Nguyen, Chirag Shah, Hitali Sheth, and Harsh Verma for their "PennCH3" search engine. PennCH3 not only searches the web but also displays information from a variety of sources, including soccer scores, stock quotes, weather forecasts, and shopping results. Users can also submit searches using their Alexa-enabled devices. Under the hood, the system is highly scalable and uses replication for fault tolerance; the team (boldly) proved this by killing some of the nodes during a live demonstration. Google generously donated four Google Home devices as a prize, and each member of the PennCH3 team received one of the devices.

A honorable mention went to project "Q+A" (Mani Mahesh, Archith Shivanagere, Rishisingh Solanki, and Sanidhya Tiwari), whose solution featured location-specific results and used reinforcement learning to improve the results based on feedback from the users.

You can read more about previous Google Award winners and their projects in the CIS455/555 Hall of Fame.

Schedule

The schedule will change over time. All videos and quizzes will be linked via the schedule.