|
Research
My research interests lie in the areas of databases and distributed systems,
especially as they relate to the Web, Web-scale information sharing, and
distributed networks of devices (e.g., sensors, actuators). I am a member of
the database, wireless/mobile systems, and
systems research groups at Penn.
My research projects relate to making it easier to exchange, locate, and analyze networked information.
- ORCHESTRA focuses on the problem of
collaborative data sharing: exchanging data and updates among loose confederations of
databases, when the different database owners have different schemas and different ideas of what is the "right"
content. We have developed techniques to map data and updates among different
sites, maintain data provenance, and use the data provenance as the
basis of assessing trust and ultimately to resolve conflicts. We
specifically target biological data sharing applications.
See here for an overview paper. Funded by NSF
CAREER #IIS-0477972.
- The Q query system addresses the
challenges of querying in a system like Orchestra, when one does not
know apriori where to find the most relevant data. Q takes as input a
keyword query, which it matches against schema elements to produce potential
data integration queries. The system returns answers from the most
promising queries and takes user feedback on the results. This
feedback is used to learn which sources are most relevant to the
information need that motivated the query. Funded by NSF CAREER #IIS-0477972
and SEIII #IIS-0513778.
- Aspen addresses the problem of programming and
integrating large-scale and complex sensor networks. The system focuses on a
setting in which large numbers of distributed sensors, with varying
capabilities, must be coordinated in order to manage and reason about
collections of physical entities and phenomena. My focus is on sensor
data integration, i.e., integration of data streams from multiple sensor
(and other) sources. A target application is data center monitoring for energy,
temperature, load, and other factors. Different aspects of the research are funded by NSF III
#IIS-0713267, NOSS
#CNS-0721541, and a University Research Initiative grant from Lockheed Martin.
- CopyCat, in collaboration with USC Information Sciences
Institute (led by Craig Knoblock) and Fetch
Technologies (led by Steve Minton), considers the problem of how to make it easy for users to
author, use, and debug mappings for one-time integration tasks. The system
presents a spreadsheet-like workspace, into which the user may paste columns
and rows of data from source applications. The system attempts
to learn what data is being extracted and what queries are being
asked, and it makes auto-complete suggestions that generalize the
user's work. The user provides feedback (either explicitly or by
pasting more data) and the system refines its suggestions accordingly.
Provenance information is used to explain and debug results, and it is also a
foundation for the learning process. See
here for an overview paper.
CopyCat was funded in part by a DARPA IPTO seedling in the area of "best
effort data integration," and is also funded in part by DARPA DSO funding through the CSSG program.
I also participate in several projects that are led by my
colleagues at Penn:
- pPOD
(led by Val Tannen) focuses on the modeling and management of information related to phylogenetic trees. pPOD leverages the Orchestra engine.
- PIRIS (led by Doug Wiebe) focuses on integrating data
records relating to gunshot wound cases in Philadelphia, in order to help
support intervention. Funded by the State of Pennsylvania.
Acknowledgments: I have also received grants from
DARPA CSSG (#HR0011-06-1-0016), Penn
ISTAR, the State of Pennsylvania, and Lockheed Martin, and software donations from MarkLogic, Electric Software, and IBM Corp.
Selected recent courses and seminars:
Detailed information is here.
Publications
To appear / accepted for publication:
- Dynamic Join Optimization for Multi-Hop Wireless Sensor Networks, with Svilen Mihaylov, Marie Jacob, Sudipto Guha. Accepted for publication, Proc. VLDB Endowment, Vol 3(1) and VLDB 2010.
- Reliable Storage and Querying for Collaborative Data Sharing Systems, with Nicholas Taylor. To appear, full paper, ICDE 2010.
- Maintaining Recursive Views of Regions and Connectivity in Networks, with Mengmeng Liu, Nicholas Taylor, Wenchao Zhou, and Boon Thau Loo. Accepted for publication, TKDE.
Selected recent publications:
- SmartCIS: Integrating Digital and
Physical Environments, with Mengmeng Liu, Svilen Mihaylov,
Zhuowei Bao, Marie Jacob, Boon Thau Loo, Sudipto Guha. Demonstration
description, SIGMOD 2009.
- Reconciling Differences, with TJ Green and Val Tannen. ICDT 2009.
- Recursive Computation of Regions and Connectivity in Networks, with
Mengmeng Liu, Nicholas E. Taylor, Wenchao Zhou, and Boon Thau Loo. ICDE 2009.
- Interactive Data Integration through Smart Copy and Paste, with
Craig Knoblock, Steve Minton, Marie Jacob, Partha Talukdar, Rattapoom
Tuchinda, Jose Luis Ambite, Maria Muslea, Cenk Gazen. CIDR 2009.
- The Orchestra Collaborative Data Sharing
System, with Todd J. Green, Grigoris Karvounarakis, Nicholas
E. Taylor, Val Tannen, Partha Pratim Talukdar, Marie Jacob, Fernando
Pereira. ACM SIGMOD Record, September 2008.
- A Substrate for In-Network Sensor Data Integration, with Svilen
Mihaylov, Marie Jacob, and Sudipto Guha. DMSN 2008. Extended version
available as Technical Report MS-CIS-08-26.
- Learning to Create Data-Integrating Queries, with Partha Pratim
Talukdar, Marie Jacob, M. Salman Mehmood, Koby Crammer, Fernando Pereira,
and Sudipto Guha, VLDB 2008.
- Bidirectional Mappings for Data and Update Exchange, with Grigoris
Karvounarakis, WebDB 2008.
- Sideways Information Passing for Push-Style Query Processing, with
Nicholas Taylor. ICDE 2008, Cancun, Mexico.
- DBpedia: a Nucleus for a Web of Open Data, with Soeren Auer,
Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak.
ISWC/ASWC In-Use Track, 2007.
- Update Exchange with Mappings and Provenance, with Todd J. Green,
Grigoris Karvounarakis, and Val Tannen. VLDB 2007.
- Adaptive Query Processing, with Amol
Deshpande and Vijayshankar Raman. Foundations and Trends in
Databases, Vol. 1 No. 1, 2007. Hardcopy available at a discount from Now
Publishers; see here.
- ORCHESTRA: Facilitating Collaborative Data
Sharing, with TJ Green, Nick Taylor, Grigoris Karvounarakis, Olivier Biton,
Val Tannen. Demonstration description, SIGMOD 2007.
- Reconciling while Tolerating Disagreement in Collaborative Data
Sharing, with Nick Taylor. SIGMOD 2006.
A complete list is here.
PhD Advisees
Collaborators
- Val Tannen, Penn CIS
- Insup Lee, Penn CIS
- Sudipto Guha, Penn CIS (currently on sabbatical at Google, Inc.)
- Matt Blaze, Penn CIS
- Fernando Pereira, Penn CIS (currently on leave at Google, Inc.)
- Lyle Ungar, Penn CIS
- Boon Thau Loo, Penn CIS
- Chris Stoeckert, Penn Center for Bioinformatics
- Pete White, Children's Hospital of Philadelphia
- Steve Minton, Fetch Technologies
- Craig Knoblock, USC ISI
Graduated Students
Tips on Interviewing
Finishing your PhD and going on the job market? I have previously
compiled a list of reverences on interviewing, which you can find
here.
|