Challenges in Integrating Biological Data Sources

S.B. Davidson, C. Overton and P. Buneman

J. Computational Biology 2 (1995), pp 557-572.

Panel position paper for MIMBD95 (Cambridge, England), July 1995.

In the 1985 National Academy of Sciences report, ``Models for Biomedical Research: A New Perspective,'' Morowitz et al argue that biological research had reached a point where ``new generalizations and higher order biological laws are being approached but may be obscured by the simple mass of data.'' The authors go on to propose the creation of a ``Matrix of Biological Knowledge'' (now called the Biomatrix) in which data, information and knowledge are structured and stored to provide an integrated view of biology.

Since then, various strategies to integrating data for the biological research enterprise have appeared throughout the biological informatics community. In this report, we examine the technical challenges to integration, critique the available tools and resources, and compare the cost and advantages of various methodologies. We begin by analyzing the basic steps in strict and complete integration. We then look at the solution space of integration strategies as defined by two axes, the ``tightness'' of federation and the ``degree'' of instantiation, discuss where various solutions fall on this plane, and examine their cost and advantages/disadvantages. Finally, we examine technical challenges that are not adequately addressed by these approaches, but are essential elements of a long-term solution: managing optimizations to provide timely response, and dealing with updates at both the instance and schema level.

See here for the paper.


Back Back to DB Group Homepage

sharker@saul.cis.upenn.edu