With the increasing complexity of models and tasks to which databases are being applied in recent years, research in database theory and applications has moved away from the simple, flat-relational data-model and towards data-models and paradigms, such as semantic or object oriented databases, which support more complex data-structures.
The purpose of this work is to study databases involving complex, potentially recursive data-structures and how such databases can be queried and manipulated. A major insight can be gained by considering the analogy between the relationship between values and types in programming languages and that between instances and schemas in databases. Though the precise interpretations of these terms vary widely, broadly speaking a type describes a collection of values with similar structure, while a schema describes a set of admissible instances for a database. However a schema will normally describe a number of constraints on database instances which go beyond the type of the instance (considered as an unusually complex data value). The constraints expressable are dependent on the particular data-model and DBMS being consiidered, but perhaps the most fundamental of these constraints, common to any database system, is that a schema describes a number of finite sets, representing the data stored in the database, and the relationships between these sets. For example a relational database schema will describe a number of relations (finite sets of tuples), while an object-oriented schema would describe a number of classes (finite sets of objects).
The basic premise of this work is that, when working with a database, all values originate from one of these finite _extents_, and that even recursive data-values will have finite representations in terms of the elements of these extents. This knowledge provides us with the ability to implement efficient transformations between databases, and to evaluate recursive function definitions on databases which would not be decidable in a more general programming environment.
The type system we develop is basically the same as that of [Abiteboul/Kannelakis], and includes type constructors for records, varients and sets, as well as "Classes" which are associated with finite extents. A schema then associates a type with each class in some distinguished finite set of classes, this being the type of objects in that class.
As with [Abiteboul/Kannelakis] we consider both a "value-based" model, in which data-values are regular trees, and an object-identity based model, and examine the correspondences between them. However we do this from the perspective of observational distinguishability of instances in the presence of various predicates on values of class type.
We also look at the idea of keys as a means of distinguishing or refering to objects in a database. I believe that these are extremely important for the purpose of practical programming and querying with databases involving such complex data-structures, and also for languages allowing the creation of new object identities.
A logical language for expressing database transformations and constraints on these models, similar to the one already developed by myself and others for more limited data-models, will be constructed. Here transformations are considered to be simple structural manipulations of data-values, mapping instances of one schema to instances of another. Such transformations can arrise from a number of distinct database tasks: schema evolutions, data-entry, user views of databases, importing and exporting data from heterogenous databases and so on. Our collaborative work with the Human Genome Center at the Univerity of Pennsylvania indicates that such transformations are extremely important and useful.
We show that a class of "non-recursive" transformations, in which a source database instance can be converted to a target instance in a single pass, are of particular interest. For such transformations our logic can be considered to be a declarative language for programming the transformations. Compiling such transformation programs consists of rewriting the programs into a "normal form" in which clauses completely describe how a particular class is populated. These normal form programs can then easily be converted into an appropriate database programming language.
It is apparent that the simple and somewhat ad hoc collection of constraints normally available in a relational database model are not adequate to express many of the constraints arrising in the more complex datamodels currently gaining in popularity, or in the tasks to which such databases are being applied. We also believe that there is an important interaaction between database transformations and constraints, with database transformations implying various constraints on the source and target databases, and constraints playing a part in determining transformations. We therefore use the same logic as a constraint language as for determining transformations, allowing us to express a very general family of constraints, and to reason about transformations, constraints and the interactions between the two in a single framework.
In order to query databases involving values which are recursive or have unbounded depth it is desirable to be able to write recursive function definitions over the data. However, in a general programming environment such recursive functions may not terminate, while traditionally, when dealing with database queries, we expect all queries to terminate and to be efficiently computable.
Further a recursive function definition may admit a number of undesired solutions in addition to the intended solution. We propose criteria for determining the desirable solutions for a function equation, representing those that can be computed in a "constructive" manner. We develop an algorithm for finding these desired solutions. The algorithm makes use of the known finite extents of values to ensure termination. We go on to show that these solutions can also be computed (though perhaps not very efficiently) without using general recursion at all, but instead using only structural induction on sets.
See here for the paper.