Past Research Projects

Past Research Projects

Penn Research
Database Group

Current Research
Tukwila
Sagres
Piazza
IBM Xperanto Demo
XQuery

Tukwila: Data Integration, XML, and Adaptive Query Processing

The Tukwila data integration system was my Ph.D. thesis project. Our focus has been on developing a query processor for data integration that provides good performance. In data integration, we pose queries across a variety of heterogeneous, autonomous data sources scattered throughout the intranet and Internet (but mapped into a single, unified "mediated schema"). These sources will generally be able to export their data in XML over HTTP, but we typically have very limited knowledge about network performance or statistics about the data within the sources. As a result, it is difficult to "optimize" the plan for combining the data from the sources. My thesis proposes a combination of three techniques to address these challenges: (1) use of operators with flexible scheduling policies, to mask latencies, (2) overlapping of query operations using pipelining, even for XML data, with the x-scan operator, and (3) convergent query processing, which allows the system to choose a query plan, execute it for some amount of time, then revise statistics and cost estimates and generate an improved plan -- all in mid-stream.

I expect that we will use Tukwila as a foundation for numerous other research projects: currently it serves as the engine behind the Sagres, Piazza, and Revere systems at the University of Washington. There are many research directions that can still be explored in the space of adaptive query processing, especially once storage is considered. I am also very interested in the possibility of fleshing it out in a few directions and releasing it as an open-source codebase.

Zachary G. Ives, Marc Friedman, Daniela Florescu, Alon Levy, Daniel S. Weld. An Adaptive Query Execution System for Data Integration, SIGMOD 1999, Philadelphia, PA.
Zachary G. Ives, Alon Y. Levy, Daniel S. Weld. Efficient Evaluation of Regular Path Expressions on Streaming XML Data. Technical Report UW-CSE-2000-05-02, University of Washington.
Zachary G. Ives, Alon Y. Levy, Daniel S. Weld, Daniela Florescu, Marc Friedman. Adaptive Query Processing for Internet Applications. IEEE Data Engineering Bulletin, Vol. 23 No. 2, June 2000.
Zachary G. Ives, Alon Y. Halevy, Daniel S. Weld. Integrating Network-Bound XML Data. IEEE Data Engineering Bulletin, June 2001.
Zachary G. Ives, Alon Y. Halevy, Daniel S. Weld. An XML Query Engine for Network-Bound Data. Submitted for publication, 2002.
Zachary G. Ives, Alon Y. Halevy, Daniel S. Weld. Convergent Query Processing. Submitted for publication, 2002.
Zachary G. Ives. Ph.D. Dissertation: Efficient Query Processing for Data Integration, August 2002.

Sagres: An Initial Look at Managing Data Sharing Among Devices

The Sagres project built a series of "active rules" (event-based triggers) on top of the Tukwila core, and we showed how the basic ideas of data integration could be used to manage interactions between devices in a ubiquitous computing environment.

Zachary Ives, Alon Levy, Jayant Madhavan, Rachel Pottinger, Stefan Saroiu, Igor Tatarinov, Shiori Betzler, Qiong Chen, Ewa Jaslikowska, Jing Su, W.T. Theodora Yeung. Demonstration: Self-Organizing Data Sharing Communities with SAGRES, SIGMOD 2000, Dallas, TX.

Piazza: Semantically Rich Data Sharing among Peers

In building Sagres, we came to understand that many of the issues in Sagres were not unique to ubiquitous computing -- and in fact, many of them (e.g., replication, propagation of updates, and data migration) also appeared in peer-to-peer systems. The Piazza peer data management system examines these and other problems, and it also generalizes the basic ideas of data integration. Instead of having a single mediated schema, peer data management allows us to have a different mediated schema at each peer. Schemas between peers can be related via a set of mappings; now, all of the data sources within a peer data management system can be related by evaluating the transitive closure of all mappings between peers.

Piazza also serves as an interesting "bridge" between traditional data integration and the so-called "semantic web" advocated by Tim Berners-Lee and the World-Wide Web Consortium.

Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu. What Can Databases Do for Peer-to-Peer? WebDB Workshop on Databases and the Web, 2001.
Alon Y. Halevy, Zachary G. Ives, Dan Suciu, Igor Tatarinov. Schema Mediation in Peer Data Management Systems. To appear, International Conference on Data Engineering, 2003.
Oren Etzioni, Alon Halevy, AnHai Doan, Zachary Ives, Jayant Madhavan, Luke McDowell, Igor Tatarinov. Crossing the Structure Chasm. To appear, CIDR 2003.

Xperanto: Processing XML Queries over DB2

I spent a summer at IBM Almaden Research Center, and was one of the initiators of the Xperanto middleware layer, which exports relational data into XML. IBM is now commercializing the technology as part of their effort to XML-ify DB2.

Michael Carey, Daniela Florescu, Zachary Ives, Ying Lu, Jayavel Shanmugasundaram, Eugene Shekita, Subbu Subramanian. XPERANTO: Publishing Object-Relational Data as XML. Third International Workshop on the Web and Databases, Dallas, TX.

XQuery: Querying and Updating XML Data

I have provided a number of suggestions to the W3C's XQuery Working Group, which is developing a standard query language for XML (the "SQL for XML," if you will). Since the focus of the working group has been limited to querying data, I co-authored a paper looking at the next step -- the semantics for updating XML data -- in a recent paper. Our update language may serve as a useful foundation for developing distributed data management systems for collaboration.

Zachary G. Ives, Ying Lu. XML Query Languages in Practice: An Evaluation. Web Age Information Management 2000, Shanghai, China.
Igor Tatarinov, Zachary G. Ives, Alon Y. Halevy, Daniel S. Weld. Updating XML. SIGMOD 2001, Santa Barbara, CA.

Zack Ives

Last modified: Fri May 21 14:16:01 EDT 2004