CIS 455 / 555: Internet and Web Systems (Fall 2015)

Instructor Zachary Ives
Location: 576 Levine Hall
Office hour: Wednesdays 1:30-2:30pm
Time and location Location: 100 Towne (Heilmeier Hall)
101 Levine (Wu and Chen Auditorium)
Monday/Wednesday 10:30am - noon
Teaching assistants As you know there is a space challenge in SEAS. Unless announced in advance, all office hours are in the Levine 6th Floor "bump space" near the elevator. If the TA isn't there, please look on the board for an alternate space, or email us. We appreciate your patience.

Pedro Samora, Fridays, 4:30-6:30pm
Parul Bhalla, Tuesdays, 2:00-3:00pm
Sagarika Rayudu, Thursdays, 6:00-7:00pm
Course description This course focuses on the issues encountered in building Internet and web systems: scalability, interoperability (of data and code), atomicity and consistency models, replication, and location of resources, services, and data. Note that it is not about building database-backed or PHP/JSP/Servlet-based web sites (for this, see CIS 450/550 or NETS 212). Here, we will learn how a Servlet server itself is built!

We will examine how XML standards enable information exchange; how web services support cross-platform interoperability (and what their limitations are); how "cloud computing" services work; how to do replication and Akamai-like content distribution; and how application servers provide transaction support in distributed environments. We will study techniques for locating machines, resources, and data (including directory systems, information retrieval indexing and ranking, web search, and publish/subscribe systems); we will discuss collaborative filtering and mining the Web for patterns; we will investigate how different architectures support scalability (and the issues they face). We will also examine the ideas that have been proposed for tomorrow's Web, including the "Semantic Web", and see some of the challenges, research directions, and potential pitfalls.

An important goal of the course is not simply to discuss issues and solutions, but to provide hands-on experience with a substantial implementation project. This semester's project will be a peer-to-peer implementation of a Googe-style search engine, including distributed, scalable crawling; indexing with ranking; and even PageRank. We will also incorporate the use of topic-specific recognizers and mash-ups.

As a side effect of the material of this course, you will learn about some aspects of large-scale software development: assimilating large APIs, thinking about modularity, reading other people's code, managing versions, debugging, and so on.

CIS555 is now a core course for the MSE degree; for details, please see the MSE requirements. The Daily Pennsylvanian recently published a nice article about CIS455/555.

Format The format will be two 1.5-hour lectures per week, plus assigned readings from handouts. There will be regular homework assignments and a substantial implementation project with experimental validation and a report. There will also be a midterm and a final exam.
Prerequisites This course expects familiarity with threads and concurrency, as well as strong Java programming skills. Those highly proficient in another programming language, such as C++ or C#, should be able to translate their skills easily. The course will require a considerable amount of programming, as well as the ability to work with your classmates in teams.
Texts and readings Distributed Systems: Principles and Paradigms, 2nd ed, by Tanenbaum and van Steen, Prentice Hall
Additional materials will be provided as handouts or in the form of light technical papers.
Grading Homework 32%, midterm 15%, final exam 15%, project 33%, participation 5%.
Other resources We will be using Piazza for course-related discussions; please sign up here.
Assignments The homework assignments will be available here.
Final project Wondering what you will be able to do at the end of this class? Here is an example from Spring 2013:
70! result with preview
Brandon, Edward, Steven, and Mitchell
70! result with video
Brandon Krieger, Steven Krouse, Mitchell Stern, and Edward Wadsworth built "70!", a cloud-based search engine. 70! consists of 1) a scalable distributed crawler that runs on Amazon EC2 instances and uses FreePastry for coordination; 2) an indexer and a PageRank engine that is based on Elastic MapReduce; and 3) a web frontend. Hitchhiker can show previews of the results it returns, and it supports voice-controlled search; as a special feature, users can navigate the results by 'swiping' with their hands in front of a webcam.
Previous versions (some taught by Prof. Haeberlen) Spring 2015 |  Spring 2014 |  Spring 2013 |  Spring 2012 |  Spring 2011