Reading list for CIS 455/555
This list contains a selection of interesting papers related to the course. Only the papers
in bold are mandatory reading; however, you may want to read some of the other papers on
a particular subject if you are curious. I will be updating this list frequently; feel free to
suggest additional papers!
Servers and Server Architectures
- Web protocols and Practice, Chapter 4: Web Servers, B. Krishnamurthy and J. Rexford.
Nice introduction to web server design and web server internals.
- Building Secure High-Performance Web Services with OKWS,
M. Krohn, USENIX ATC 2004
Describes the design of a high-performance web server.
- SEDA: An Architecture for Well-Conditioned, Scalable Internet Services,
M. Welsh, D. Culler, and E. Brewer, SOSP 2001
This paper introduced the multi-stage event-driven architecture we briefly discussed in class.
- Flash: An Efficient and Portable Web Server,
V. Pai, P. Druschel, and W. Zwaenepoel, USENIX ATC 1999
Describes a high-performance, event-driven server architecture.
Naming and locating resources
- High-Performance Web Crawling,
M. Najork and A. Heydon, SRC RR 173
- Mercator: A scalable, extensible Web crawler,
A. Heydon and M. Najork, WWW 1999
More crawling basics.
- The anatomy of a large-scale hypertextual Web search engine
S. Brin and L. Page, WWW 1998
The paper that describes the original Google.
- Detecting Near-Duplicates for Web Crawling,
G. Manku, A. Jain, A. Sarma, WWW'07
More details on duplicate elimination.
- Search Engine Optimization (SEO)
Part of the Webmaster Tools documentation; has useful links to current SEO practices.
- Globally Distributed Content Delivery,
J. Dilley, B. Maggs, J. Parikh, H. Prokop, R. Sitaraman, and B. Weihl, IEEE Internet Computing Sep/Oct 2002
Description of Akamai's CDN; written by the designers at Akamai.
- Measuring and Evaluating Large-Scale CDNs,
C. Huang, A. Wang, J. Li, and K. Ross, IMC 2008
A measurement study of Akamai's CDN and another CDN (Limelight). More technical details and some interesting numbers.
- The Ubiquitous B-Tree,
D. Comer, ACM Computing Surveys Vol. 11 No. 2, June 1979
Survey paper on B-Trees and their major variations, including B*-trees and B+-trees.
- BATON: A Balanced Tree Structure for Peer-to-Peer Networks,
H. Jagadish, B. Ooi, Q. Vu, VLDB 2005
Describes a P2P overlay that is structured as a balanced tree (and not a ring, hypercube, etc.)
Introduction to Information Retrieval, C. Manning, P. Raghavan, and
H. Schütze, Cambridge University Press
Additional material on ranking, stemming, lemmatization, etc.; highly
recommended for the 'ranking expert' on your final project team.
Back to the course web page