Instructor
Time and location
Teaching assistants
Course description
This course focuses on the issues encountered in building Internet and web systems: scalability, interoperability (of data and code), atomicity and consistency models, replication, and location of resources, services, and data. Note that "Internet" in this context refers to the ecosystem of applications that you can use to access and share information across the world. For details on the system that underpins these applications, see CIS 553).
On a similar note, this course is not about building database-backed or PHP/JSP/Node-based web sites (for this, see CIS 450/550 or NETS 212) or about the use of "big data analytics" platforms like MapReduce, Apache Storm, Spark, etc (for that, see CIS 545). While this course touches on the above topics, here we will learn how these systems are actually built!
We will examine how XML standards enable information exchange; how web services support cross-platform interoperability (and what their limitations are); how "cloud computing" services work; how to do replication and Akamai-like content distribution; and how application servers provide transaction support in distributed environments. We will study techniques for locating machines, resources, and data (including directory systems, information retrieval indexing and ranking, web search, and publish/subscribe systems); we will discuss collaborative filtering and mining the Web for patterns; we will investigate how different architectures support scalability and distributed coordination (and the issues they face). We will also examine the ideas that have been proposed for tomorrow's Web, and see some of the challenges, research directions, and potential pitfalls.
An important goal of the course is not simply to discuss issues and solutions, but to provide hands-on experience with a substantial implementation project. This final project will be an implementation of a Google-style search engine, including distributed, scalable crawling; indexing with ranking; stream processing; and even PageRank on your own MapReduce-style platform!
As a side effect of the material of this course, you will learn about some aspects of large-scale software development: assimilating large APIs, thinking about modularity, reading other people's code, managing versions, debugging, and so on.
CIS555 is now a core course for the MSE degree as well as an option for the WPE I requirement for PhDs. The Daily Pennsylvanian published a nice article about CIS455/555.
Format
Links
Prerequisites
Texts and readings
Assignments
Grading
Final project
Schedule