|
Database Group
ORCHESTRA
SHARQ
Aspen
Tukwila
Peer-to-Peer
Data Integration
| |
Managing the Collaborative Sharing of Evolving
Data
One of the most prevalent problems today is the need to map data
from one database to another -- where the databases may potentially have
different schemas and interfaces. Examples include everything from
bibliographic citation databases to course grade sheets to the ACM Digital
Library. Once data is mapped, it is frequently modified in multiple
places at once, and the challenge lies in "synchronizing" or
reconciling the modifications.
Project Overview
The ORCHESTRA project focuses on the challenges
of such data sharing scenarios in the sciences -- specifically addressing
the challenges in bioinformatics. In this domain, there are a great many
"standardized" databases with overlapping information, similar but not
identical data, differing levels of data quality/confidence, and a
variety of different target audiences. In general, each database owner
would
like to store a "live" view of all relevant knowledge in its domain --
however, each site is being independently extended, corrected, and
analyzed. Moreover, individual biologists would like to be able to
download and maintain local "live snapshots" of data in order to run
their own experiments. Unfortunately, there is often no consensus on
what the best data is -- certain data items will always be disputed or
revised. Our focus in the ORCHESTRA collaborative data sharing
system (CDSS) is on how to support reconciliation across different
schemas,
with disagreeing users. In general, each participant in the system
specifies whom it trusts, and this is used to locally resolve
conflicts.
Click on any of the images below to see a larger version.
Basic Process
The figure to the right illustrates the basic functionality of ORCHESTRA. The system coordinates among a set of participating
sites, each of whom manages a database. Schema mappings describe
how the data at these sites relates. Trust conditions specify which
sites trust which data (and how much). The system allows all of the sites
to be continuously updated, and on demand, it will propagate these updates
across sites, according to the specified schema mappings and trust.
Research Topics
The ORCHESTRA project touches on a number of important database-
related topics, including update translation across mappings or
views; conditional information; peer-to-peer data sharing; data provenance;
and more.
This project takes our past work on the Piazza system one step further
in supporting decentralization. See the list of publications below
for further details.
System Implementation
We are planning on an open source release of the prototype ORCHESTRA system in early to mid-2008. Currently we are
happy to arrange for demonstrations and trial deployments here at Penn.
New: we gave a demonstration of the prototype
ORCHESTRA system at SIGMOD 2007 and DILS 2007.
Here are some screen shots:
This is the main ORCHESTRA screen, showing a series
of biological databases (ellipse nodes) and mappings among them (arcs with
"Mx" labels). The PCBI PlasmoDB database has been highlighted.
This is the ORCHESTRA provenance viewer, which
shows how a given data value (the tuple selected from the list on the right
side of the screen) was produced. In this case, the tuple is highlighted
graphically in green, and the arrows going into it represent sources from
which it was derived. This tuple was derived from Mapping M5, which combined
three tuples, which were in turn direct user insertions (the
"+"s in the diamond vertices). In general, derivations can be significantly
more complex.
Related Publications
- Todd J. Green, Zachary G. Ives, Grigoris Karvounarkis, Val
Tannen. Update Exchange with Mappings and Provenance, to appear, VLDB 2007.
- Todd J. Green, Grigoris Karvounarakis, Nicholas E. Taylor, Olivier Biton,
Zachary G. Ives, Val Tannen. ORCHESTRA: Facilitating Collaborative Data
Sharing. Demonstration description, SIGMOD 2007.
- Todd J. Green, Grigoris Karvounarakis, Val Tannen. Provenance Semirings.
PODS 2007.
- Nicholas Taylor, Zachary Ives. Reconciling Changes while Tolerating
Disagreement in Collaborative Data Sharing. SIGMOD 2006,
Chicago, IL.
- Zachary G. Ives, Nitin Khandelwal, Aneesh Kapur, Murat Cakir.
ORCHESTRA: Rapid, Collaborative Sharing of
Dynamic Data. Conference on Innovative
Database systems Research (CIDR), Asilomar, CA, 2005.
Team Members
- TJ Green
- Grigoris Karvounarakis
- Nick Taylor
- Soeren Auer
- Prof. Zachary Ives
- Prof. Val Tannen
Team Alumni
- Olivier Biton
- Murat Cakir
- Charuta Joshi
- Aneesh Kapur
- Ivan Terziev
- Mike Wittie
- Nitin Khandelwal
Sponsorship
This research is funded by NSF CAREER grant award #IIS-0477972,
awarded to Zachary G. Ives at the University of Pennsylvania, and NSF SEIII
grant #IIS-0513778.
Any opinions, findings, and conclusions or recommendations expressed in this
material are those of the author(s) and do not necessarily reflect the views of
the National Science Foundation.
Last modified: Wed Jul 18 12:03:16 EST 2007
|