|
Statements and CV
Here are my
research and
teaching statements and
CV.
Research
My research interests lie in the areas of databases and distributed systems,
especially as they relate to the Web, Web-scale information sharing, and
distributed networks of devices (e.g., sensors, actuators). I am a member of
the database, wireless/mobile systems, and
systems research groups at Penn.
My research projects relate to making it easier to exchange, locate, and analyze networked information.
- ORCHESTRA focuses on the problem of
collaborative data sharing: exchanging data and updates among loose confederations of
databases, when the different database owners have different schemas and different ideas of what is the "right"
content. We have developed techniques to map data and updates among different
sites, maintain data provenance, and use the data provenance as the
basis of assessing trust and ultimately to resolve conflicts. We
specifically target biological data sharing applications. Funded by NSF
CAREER #IIS-0477972.
- The Q query system addresses the
challenges of querying in a system like Orchestra, when one does not
know apriori where to find the most relevant data. Q takes as input a
keyword query, which it matches against schema elements to produce potential
data integration queries. The system returns answers from the most
promising queries and takes user feedback on the results. This
feedback is used to learn which sources are most relevant to the
information need that motivated the query. Funded by NSF CAREER #IIS-0477972
and SEIII #IIS-0513778.
- Aspen addresses the problem of programming and
integrating large-scale and complex sensor networks. The system focuses on a
setting in which large numbers of distributed sensors, with varying
capabilities, must be coordinated in order to manage and reason about
collections of physical entities and phenomena. My focus is on sensor
data integration, i.e., integration of data streams from multiple sensor
(and other) sources. Different aspects of the research are funded by NSF III
#IIS-0713267 and NOSS #CNS-0721541.
I also participate in several projects that are led by my
colleagues at Penn:
- SHARQ
(led by Susan Davidson) is a joint effort with the Penn Center for Bioinformatics. It leverages the core Orchestra engine
and the Q system, plus a portal (SHARQ Guide) that offers both keyword search and
browse access to data sources, schemas, and queries. Funded by NSF
SEIII #IIS-0513778.
- pPOD
(led by Val Tannen) focuses on the modeling and management of information related to phylogenetic trees. pPOD leverages the Orchestra engine.
- PIRIS (led by Doug Wiebe) focuses on integrating data
records relating to gunshot wound cases in Philadelphia, in order to help
support intervention. Funded by the State of Pennsylvania.
Acknowledgments: I have also received grants from
DARPA CSSG (#HR0011-06-1-0016), Penn
ISTAR, the State of Pennsylvania, and Lockheed Martin, and software donations from MarkLogic, Electric Software, and IBM Corp.
Selected recent courses and seminars:
Detailed information is here.
Publications
To appear:
- Recursive Computation of Regions and Connectivity in Networks, with
Mengmeng Liu, Nicholas E. Taylor, Wenchao Zhou, and Boon Thau Loo. Accepted
for publication, ICDE 2009.
- The Orchestra Collaborative Data Sharing
System, with Todd J. Green, Grigoris Karvounarakis, Nicholas
E. Taylor, Val Tannen, Partha Pratim Talukdar, Marie Jacob, Fernando
Pereira. To appear, ACM SIGMOD Record, September 2008.
- Invited entries on Adaptive stream
processing, Updates in P2P systems,
and XML publishing for the upcoming
Encyclopedia of Database Systems, edited by Ling Liu and M. Tamer Ozsu, soon to
be available from Springer.
Selected recent publications:
- A Substrate for In-Network Sensor Data Integration, with Svilen
Mihaylov, Marie Jacob, and Sudipto Guha. DMSN 2008.
- Learning to Create Data-Integrating Queries, with Partha Pratim
Talukdar, Marie Jacob, M. Salman Mehmood, Koby Crammer, Fernando Pereira,
and Sudipto Guha, VLDB 2008.
- Bidirectional Mappings for Data and Update Exchange, with Grigoris
Karvounarakis, WebDB 2008.
- Sideways Information Passing for Push-Style Query Processing, with
Nicholas Taylor. ICDE 2008, Cancun, Mexico.
- DBpedia: a Nucleus for a Web of Open Data, with Soeren Auer,
Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak.
ISWC/ASWC In-Use Track, 2007.
- Adaptive
Query Processing, with Amol Deshpande, Vijayshankar Raman. Tutorial, VLDB 2007. Slides
- Update Exchange with Mappings and Provenance, with Todd J. Green,
Grigoris Karvounarakis, and Val Tannen. VLDB 2007.
- The Case for a Unified Extensible Data-centric Mobility
Infrastructure, with Yun Mao, Boon Thau Loo, Jonathan M. Smith.
MobiArch 2007.
- Adaptive Query Processing, with Amol
Deshpande and Vijayshankar Raman. Foundations and Trends in
Databases, Vol. 1 No. 1, 2007. Hardcopy available at a discount from Now
Publishers; see here.
- ORCHESTRA: Facilitating Collaborative Data
Sharing, with TJ Green, Nick Taylor, Grigoris Karvounarakis, Olivier Biton,
Val Tannen. Demonstration description, SIGMOD 2007.
- Reconciling while Tolerating Disagreement in Collaborative Data
Sharing, with Nick Taylor. SIGMOD 2006.
- ORCHESTRA: Rapid, Collaborative Sharing of Dynamic Data,
with Nitin Khandelwal, Aneesh Kapur, Murat Cakir. CIDR, January, 2005,
Asilomar, CA.
- Adapting to Data Integration Source
Properties, with Alon Halevy and Dan Weld.
ACM SIGMOD Conference on Management of Data, June 2004, Paris, France.
A complete list is here.
PhD Student Collaborators
Frequent Faculty Collaborators
- Steve Minton, Fetch Technologies
- Craig Knoblock, USC ISI
- Val Tannen, Penn CIS
- Insup Lee, Penn CIS
- Sudipto Guha, Penn CIS
- Matt Blaze, Penn CIS
- Fernando Pereira, Penn CIS
- Lyle Ungar, Penn CIS
- Boon Thau Loo, Penn CIS
- Chris Stoeckert, Penn Center for Bioinformatics
- Pete White, Children's Hospital of Philadelphia
A complete list of advisees is here.
|