Semistructured Data

Peter Buneman

Tutorial, PODS '97

In semistructured data, the information that is normally associated with a schema is contained within the data, which is sometimes called ``self-describing''. In some forms of semistructured data there is no separate schema, in others it exists but only places loose constraints on the data. Semistructured data has recently emerged as an important topic of study for a variety of reasons. First, there are data sources such as the Web, which we would like to treat as databases but which cannot be constrained by a schema. Second, it may be desirable to have an extremely flexible format for data exchange between disparate databases. Third, even when dealing with structured data, it may be helpful to view it as semistructured for the purposes of browsing. This tutorial will cover a number of issues surrounding such data: finding a concise formulation, building a sufficiently expressive language for querying and transformation, and optimization problems.

See here for the paper.

See here for the slides and notes for this tutorial.


Back Back to DB Group Homepage

sharker@saul.cis.upenn.edu