CIS 455 / 555: Internet and Web Systems (Spring 2018)
Location: 560 Levine Hall
Office hour: Mondays 1:00-2:00pm
|Time and location||Location: Berger Auditorium
Mondays + Wednesdays 10:30am – noon
Hengchu Zhang, firstname.lastname@example.org
Office hour: Mondays noon-1:00pm (GRW 5th floor bump space)
Yingjie Luan, email@example.com
Sahana Vijaya Prasad, firstname.lastname@example.org
Rishabh Gupta, email@example.com
Kejin Fan, firstname.lastname@example.org
Urja Nadibail, email@example.com
Jane Lee, firstname.lastname@example.org
Linyan Dai, email@example.com
Victoria Xiao, firstname.lastname@example.org
Animesh Shah, email@example.com
This course focuses on the issues encountered in building Internet and web systems:
scalability, interoperability (of data and code), atomicity and consistency models,
replication, and location of resources, services, and data. Note that it is not
about building database-backed or PHP/JSP/Node-based web sites (for this, see
CIS 450/550 or
Here, we will learn how a web server itself is built!
Similarly, the course covers stream processors and "big data analytics" platforms like MapReduce, Apache Storm, Spark, etc. -- from the perspective of how they work. For details on using such systems, see CIS 545. Here you'll actually build such systems! We will examine how XML standards enable information exchange; how web services support cross-platform interoperability (and what their limitations are); how "cloud computing" services work; how to do replication and Akamai-like content distribution; and how application servers provide transaction support in distributed environments. We will study techniques for locating machines, resources, and data (including directory systems, information retrieval indexing and ranking, web search, and publish/subscribe systems); we will discuss collaborative filtering and mining the Web for patterns; we will investigate how different architectures support scalability and distributed coordination (and the issues they face). We will also examine the ideas that have been proposed for tomorrow's Web, and see some of the challenges, research directions, and potential pitfalls.
An important goal of the course is not simply to discuss issues and solutions, but to provide hands-on experience with a substantial implementation project. This semester's project will be a peer-to-peer implementation of a Google-style search engine, including distributed, scalable crawling; indexing with ranking; stream processing; and even PageRank on your own MapReduce-style implementation!
As a side effect of the material of this course, you will learn about some aspects of large-scale software development: assimilating large APIs, thinking about modularity, reading other people's code, managing versions, debugging, and so on.
|Format||The format will be two 1.5-hour lectures per week, plus assigned readings from handouts. There will be regular homework assignments and a substantial implementation project with experimental validation and a report. There will also be two in-class midterms.|
|Prerequisites||This course expects familiarity with threads and concurrency, as well as strong Java programming skills. Those highly proficient in another programming language, such as C++ or C#, should be able to translate their skills easily. The course will require a considerable amount of programming, as well as the ability to work with your classmates in teams.|
|Texts and readings||
Distributed Systems: Principles and Paradigms, 3rd edition, by Tanenbaum and van Steen, Prentice Hall (ISBN 978-1530281756)
You can buy a physical copy (e.g., for $35 on Amazon) or download a free digital copy here.
Additional materials will be provided as handouts or in the form of light technical papers.
|Grading||Homework 32%, first midterm 15%, second midterm 15%, project 33%, participation 5%.|
|Other resources||We will be using Piazza for course-related discussions; please sign up here. A reading list is also available.|
|Assignments||The homework assignments will be available here. You can submit your solutions online (requires PennKey login).|
Wondering what you will be able to do at the end of this class? Here is an example from Spring 2017:
A honorable mention went to project "Q+A" (Mani Mahesh, Archith Shivanagere, Rishisingh Solanki, and Sanidhya Tiwari), whose solution featured location-specific results and used reinforcement learning to improve the results based on feedback from the users.
You can read more about previous Google Award winners and their projects in the CIS455/555 Hall of Fame.