Research Teaching Publications Advisees & Collaborators DB Group NETS Program
Photo

Zachary G. Ives

Professor and Markowitz Faculty Fellow
Computer & Information Science Department
University of Pennsylvania

Member, Warren Center for Network and Data Science
Member, Center for Neuroengineering and Therapeutics
Distinguished Research Fellow, Annenberg Center for Public Policy
Undergraduate Chair, Singh Program on Networked & Social Systems Engineering

 

Contact Information

576 Levine Hall North
Computer and Information Science Department
University of Pennsylvania
3330 Walnut Street
Philadelphia, PA 19104-6389
zives@cis.upenn.edu
(215) 746-2789    Fax: (215) 898-0587

Teaching NETS 212, office hours Wed 11:00-12:00

Biographical Sketch

Zachary Ives is a Professor and Markowitz Faculty Fellow at the University of Pennsylvania. He received his PhD from the University of Washington. His research interests include data integration and sharing, managing "big data," sensor networks, and data provenance and authoritativeness. He is a recipient of the NSF CAREER award, and an alumnus of the DARPA Computer Science Study Panel and Information Science and Technology advisory panel.  He has also been awarded the Christian R. and Mary F. Lindback Foundation Award for Distinguished Teaching. He serves as the undergraduate curriculum chair for Penn's Singh Program in Networked and Social Systems Engineering, and he is a Penn Engineering Fellow. He is a co-author of the textbook Principles of Data Integration, and received an ICDE 2013 ten-year Most Influential Paper award. He has been an Associate Editor for Proceedings of the VLDB Endowment (2014) and a Program Co-Chair for SIGMOD (2015).

Research

How do we tie together the world's data to answer fundamental scientific or policy questions? How do we facilitate and foster large-scale collaborative projects? My research interests lie in the areas of databases and distributed systems, especially as they relate to the Web, Web-scale information sharing, and distributed networks of devices (e.g., sensors, actuators). I am a member of the database and systems research groups, and the Warren Center for Network and Data Science at Penn. My research projects relate to making it easier to exchange, locate, and analyze networked information.

The Q query system addresses the challenges of querying in a system like Orchestra, when one does not know apriori where to find the most relevant data.  Q takes as input a keyword query, which it matches against schema elements to produce potential data integration queries.  The system returns answers from the most promising queries and takes user feedback on the results.  This feedback is used to learn which sources are most relevant to the information need that motivated the query.  Funded by NSF CAREER #IIS-0477972, SEIII #IIS-0513778, and grants from Google.
The IEEG Web Portal, in collaboration with Prof. Brian Litt of Bioengineering and Neurology, and Prof. Greg Worrell at Mayo Clinic, seeks to enable community-scale data integration and cloud-hosted science for epileptic seizure prediction (and beyond). Beyond its scientific applications, IEEG serves as a testbed for technologies from the Q System and other data integration research. As of Oct 2014 we have over 1200 datasets and 450 users. We have also hosted competitions for epileptic seizure detection and epileptic seizure prediction. Funded by NIH as well as grants from Amazon.
These two projects, along with significant infrastructure for "pay as you go" data integration, are being combined into a platform we call Habitat. More details will be forthcoming as this project moves forward.

Several prior projects have resulted in building blocks towards our ongoing work in supporting large-scale data integration and analysis. These projects are no longer directly active, but their core ideas (and code) are part of our more recent projects:

ORCHESTRA focuses on the problem of collaborative data sharing:  exchanging data and updates among loose confederations of databases, when the different database owners have different schemas and different ideas of what is the "right" content. We have developed techniques to map data and updates among different sites, maintain data provenance, and use the data provenance as the basis of assessing trust and ultimately to resolve conflicts.  We specifically target biological data sharing applications.  See here for an overview paper. Funded by NSF CAREER #IIS-0477972.
Aspen addresses the problem of programming and integrating large-scale and complex sensor networks. The system focuses on a setting in which large numbers of distributed sensors, with varying capabilities, must be coordinated in order to manage and reason about collections of physical entities and phenomena. My focus is on sensor data integration, i.e., integration of data streams from multiple sensor (and other) sources. A target application is data center monitoring for energy, temperature, load, and other factors. Different aspects of the research are funded by NSF III #IIS-0713267, NOSS #CNS-0721541, and a University Research Initiative grant from Lockheed Martin.

Acknowledgments: I have also received grants from DARPA CSSG (#HRO011-06-1-0016 and HRO1107-1-0029), Penn ISTAR, the State of Pennsylvania, Amazon, Google, and Lockheed Martin, and software donations from MarkLogic, Electric Software, and IBM Corp.

Teaching

I am the Undergraduate Curriculum Chair for Penn's Singh Program on Networked and Social Systems Engineering, NETS, which was formerly known as MKSE. This Internet-centered degree program looks at how people and systems interact over networks. It combines computer science (algorithms, distributed systems) with sociology, incentives (game theory), and dynamic systems. The overall program is directed by Ali Jadbabaie. New NETS courses I co-developed include NETS (MKSE) 212 "Scalable and Cloud Computing" and NETS (MKSE) 150 "Market and Social Systems on the Internet".

Current course:

Selected recent courses and seminars:

Detailed information is here.

Textbooks and Monographs

Principles of Data Integration, with AnHai Doan and Alon Halevy. This textbook gives a comprehensive academic treatment of the wide range of topics related to research in data integration: mappings and data transformations, query rewriting, adaptive query processing, XML and streaming data, probabilistic mappings, keyword search, data provenance, and much more. We also describe research challenges, real systems, and implementation techniques. Lecture slides are available from Elsevier. Available from Amazon in hardcopy or Kindle form; from Google Play store in e-book form; from Barnes & Noble in hardcopy or Nook form.
Adaptive Query Processing, with Amol Deshpande and Vijayshankar Raman. Foundations and Trends in Databases, Vol. 1 No. 1, 2007. Hardcopy available at a discount from Now Publishers; see here.

Selected Publications

A complete list is here.

Current Postdoc, PhD, and MS Research Advisees

Alumni — Students and Postdocs

Frequent Collaborators

Tips on Interviewing

Finishing your PhD and going on the job market? I have previously compiled a list of reverences on interviewing, which you can find here.