"Efficient Query Processing for Data Integration"

Zachary Ives
Computer Science Department
University of Washigton

Today, virtually any organization or collaboration of size (enterprise, academic department, coalition, research lab, etc.) has a need to inspect and query its data to get a better understanding of its internal processes or its domain of interest. However, this data is typically stored across a variety of heterogeneous data management applications with different terminologies. Data integration abstracts these sources into a single virtual database that the user queries. The technology enabling data integration has matured in recent years, except in one key area: providing good performance when processing queries.

There are two key challenges posed by data integration query processing. First, very little is known about the data sources, but query processors rely on statistics about the data when choosing a "plan" to use in executing a query. Second, data integration applications need to process XML since it has become the standard format for data interchange. Current techniques for processing XML do not suffice for data integration because they do not produce initial answers quickly enough, particularly for data being streamed across a network.

To address the first problem, I have developed convergent query processing, which establishes a feedback loop between query execution and optimization: the system monitors actual query plan performance and uses this new knowledge to re-estimate the cost of alternative query plans. At any point, execution can be stopped and a more promising plan can be started in mid-stream. Convergent query processing not only addresses the problem of limited knowledge in data integration, but it can also benefit traditional databases. To address the second data integration challenge -- returning initial answers quickly for XML queries -- I have developed an XML query processing architecture that incrementally provides results as data is read across the network. Combined with convergent query processing, this XML architecture provides good performance for both initial and final answers.


DayTuesday, January 8, 2002
Moore School Bldg. - Room #216
3:00 - 4:30 p.m.