Term Project Description

A one page description of your project should be emailed to susan@cis.upenn.edu by Oct. 12, 2004.

Students enrolled in CIS650 are expected to complete a term project, due on the last day of class. The project can either be an implementation project, or a paper on some topic related to the course.


Two excellent sources for finding research papers, searchable by author or title, can be found at DBLP and CiteSeer


Implementation project. Pick an idea, paper, or technique that you are interested in and implement/extend it (some examples are below, some more challenging than others). The project should not only be demonstrated and evaluated but documented, i.e. you should provide a short paper (5 page) describing what you have done and the tradeoffs involved.


Critical analysis paper. A critical analysis of some problem area based on recent papers in the database. The problem area may be one studied in class, but delved into in greater detail, or a new topic area that you are interested in. The analysis should attempt to answer the following questions: why is the problem important, what solutions have been proposed, what assumptions have been made, are the assumptions realistic, what are the reasons for making the assumptions, how can results in this area be applied in practice, what are the directions for future research in this area.


Novel solution paper. Presenting your own solution to a problem. The paper should clearly state the problem you are solving, the assumptions you are making, your solution, and a brief survey of related literature. Any claims should be substantiated.

All work should be done in groups of one or two people. If you have a team of two you must be able to partition the work so that there is clear responsibility.

Non-implementation projects will involve a paper roughly 20 pages long (double spaced). In addition to containing the substance of your project, it should contain an abstract (terse 100-200 word summary of the work presented), an introduction (statement of problem and structure of the paper), a conclusion (summary of major contributions and work remaining to be done), and references to related research. The bibilography should be in ACM or IEEE journal format.


Sample topics:

Active XML: Download the ActiveXML implementation. Create an application using it, or show how the concept can be used, for example: (1) to capture issues of security and privacy, or (2) to optimize a (non-active) page refresh by smart buffering, or (3) to implement replication and/or a distributed directory structure.


A second, more research-oriented problem is the issue of designing Active XML documents based on some analysis of how a static document changes, analogous to designing a relational database given some knowledge of functional dependencies. Come talk to me if you are interested, I have some initial ideas along these lines.


A third, research/pragmatic, problem is that of determining substitute or “equivalent” web services when some service call in an Active XML document becomes “stale” or perhaps is just too slow to respond. What types of fault tolerance or recovery strategies should be adopted?


SQL with ranked keyword search: Define a set of query language extensions/operations to XQuery or SQL that support keywords. Build a middleware layer over a relational database (Oracle, DB2, PostgreSQL) that takes queries in this language, creates and utilizes inverted indices or other structures, and makes use of these structures to answer queries. The implementation should rank results using TF/IDF or some other common ranked-results metric.


Two person project: As above, but also design and build a web crawler for the keyword engine, which indexes not only documents, but also meta-information (source web site, document date, document type, etc.)


Some suggested reading (there are more recent papers, you should search the web for them):

Florescu et al.  Integrating Keyword Search into XML Query Processing. (WWW9)

Christos Faloutsos and Douglas W. Oard. A Survey of Information Retrieval and Filtering Methods. Technical Report, University of Maryland, 1995.

Text extensions to SQL, e.g., those from SQL Server


Query language for linguistic applications: Linguists commonly annotate written text with some form of linguistic analysis, for example grammatical structure. The resulting “parse tree” may then be queried and/or updated by annotators. Due to the tree-like structure, it is natural to consider tree languages (XPath?) as a query language. However, since there is a large amount of this annotated text, efficiency is imperative. Balencing the two concerns of expressiveness and efficiency, propose a query language for linguistic applications (I have additional information here, contact me if interested).


Updating XML views: The last paper of the semester has been partially implemented but there are some interesting components missing, in particular with respect to insertion updates. There is also the interesting question of how to encorporate user suggestions to enable updates even when they are ambiguous. Send email to me if you are interested.

Braganholo et al. From XML View Updates to Relational View Updates: old solutions to a new problem . International Conference on Very Large Databases (VLDB) (2004).