CIS Homeline
   
arrow About CIS
spacer spacer
arrow Events
  CIS events in Penn Calendar
spacer spacer
arrow People
spacer spacer
arrow Research
spacer spacer
arrow Undergraduate program
spacer spacer
arrow Graduate program
spacer spacer
arrow Job Openings
   

 

CIS Home divider Penn Engineering divider PENN   spacer  

 
 CIS Research Seminar Series, 2008 

 

Tuesday, September 16, 2008


Zack Ives

Computer Science Department

University of Pennsylvania


"Orchestra: Sharing Inconsistent Data in a Consistent Way"


Abstract:

One of the most pressing needs in business, government, and science is to bring together structured data from a variety of systems, formats, and terminologies. For instance, the emerging field of systems biology seeks to unify biological data to get a big-picture view of the processes within living organisms. Many organizations have set up databases designed to be "clearing houses" for specific types of information: each is separately maintained, cleaned, and curated, and has its own schema and terminology. Updates are constantly made as hypothesized relationships are confirmed or refuted, or new discoveries are made. The different databases contain complementary information that must be integrated to get a complete picture - and each database may have data of different quality or relevance to a domain. However, there is often no consensus on what the definitive answers are - each site may have different beliefs. The Orchestra project focuses on how to support exchange of data (and updates) among collaborators with evolving databases, in a way that accommodates disagreement, different schemas, and different levels of authority and quality. Orchestra considers collaborators' databases to be logical *peers* into which data can be imported and then locally modified. It allows for a network of *schema mappings* that interrelate peers, annotated with *trust policies* specifying the conditions under which a peer is willing to import data. As a data item is mapped from site to site in the system, its *provenance* is recorded; a peer's trust policies use this provenance (and the values of the data) to assign a score to each incoming data item (based on perceived quality or relevance), and the peer then uses this score to reconcile conflicts and compute a consistent data instance, whose contents may be unique to the peer. The scores assigned to the individual sources can even be *learned* based on user feedback about query answers. The end result is a system that allows each database to selectively diverge from the others as appropriate, but to remain "in sync" in all other cases. Joint work with Todd J. Green, Grigoris Karvounarakis, Nicholas Taylor, Val Tannen, Partha Pratim Talukdar, Marie Jacob, Muhammad Salman Mehmood, Koby Crammer, Fernando Pereira, and Sudipto Guha.


 

Tuesday, September 16,2008
3:00 - 4:15
Wu & Chen
101 Levine Hall


_____________________________________________________________________________________________________

 

Archived Lectures

2007

2006

Speakers prior to

2006

 




 
 
CIS Home divider Penn Engineering divider PENN   spacer
  Send comments on this page to