The Essence of Objects

``The Essence of Objects'' is an ongoing program of research in programming languages, type systems, and distributed programming at the University of Pennsylvania. Papers on many of the topics discussed in this outline can be found via http://www.cis.upenn.edu/~bcpierce/papers.

Personnel

Principal Investigator
Benjamin Pierce
Postdocs
Jerome Vouillon
Students
Michael Levin
Vladimir Gapeyev
Haruo Hosoya (now at Kyoto University)
Atsushi Igarashi (now at University of Tokyo)
Collaborators
Sundar Balasubramaniam, Vidam Communications
Kim Bruce, Williams College
Peter Buneman, University of Pennsylvania
Luca Cardelli, Microsoft
Giorgio Ghelli, University of Pisa
Martin Hofmann, University of Edinburgh
Trevor Jim, University of Pennsylvania
Naoki Kobayashi, University of Tokyo
Insup Lee, University of Pennsylvania
Uwe Nestmann, BRICS, Aarhus University
Didier Remy, INRIA Roquencourt
Davide Sangiorgi, INRIA Sophia-Antipolis
Peter Sewell, Cambridge
David Turner, An-Teallach Ltd.
Philip Wadler, Bell Labs, Lucent

Background

Despite decades of concern in both industry and academia, expensive failures of large software projects are common. Proposed approaches to improving software quality include---among other ideas---a broad spectrum of techniques for helping ensure that a software system behaves correctly with respect to some specification, implicit or explicit, of its desired behavior. On one end of this spectrum are powerful frameworks such as algebraic specification languages, modal logics, and denotational semantics; these can be used to express very general correctness properties but are cumbersome to use and demand significant involvement by the programmer not only in the application domain but also in the formal subtleties of the framework itself. At the other end are techniques of much more limited power---so limited that they can be built into compilers and thus ``applied'' even by programmers unfamiliar with the underlying theories. Such methods often take the form of type systems or, in the context of large-scale programming, module systems. My research focuses on type and module systems and their applications to object-oriented programming, concurrent and distributed systems, and mobile applications.

Type systems are usually formulated as collections of rules for checking the ``shape consistency'' of programs. This kind of checking exposes not only trivial mental slips, but also deeper conceptual errors, which frequently manifest as type errors. At a larger scale, module systems formalize the shapes of the interfaces between components of a large software project and automatically verify their consistency. By making interfaces explicit, they also help document the structure of the system and guide the design process. In conventional programming languages such as Ada, the module system provides straightforward mechanisms for packaging components and abstracting away from internal representations of data. Newer languages like Standard ML go further, supporting the explicit construction and type-checking of more general procedures for constructing large-scale program components from others. On the other hand, object-oriented languages such as C++ use objects and classes for both small- and medium-scale program structuring, combining aspects of type and module systems in a single mechanism. Newer object-oriented languages such as Java also provide rudimentary support for larger-scale packaging of groups of classes.

Research Themes

My own research in type and module systems began with an exploration of the connections between object-oriented abstraction and the data abstraction supported by conventional module systems. Much of the technical work (by myself and others) involved developing an appropriate mathematical framework for modeling both objects and basic abstract data types. My current interest is in enriching this framework so that other aspects of module systems, such as the control of name spaces and shared substructures, can also be expressed. An important parallel effort is the development of practical type inference techniques for the rich type systems used in these models. The long-term goal is the design of new type and module systems for practical programming languages and environments, in which objects and classes can coexist with a variety of other mechanisms for structuring very large-scale programs.

Another focus in my recent work has been the use of static type systems in concurrent programming based on message passing, in particular concurrent object-oriented programming. My previous research in this area (in collaboration with Sangiorgi, Kobayashi, and Turner, among others) has shown how type systems with features such as subtyping and linearity information can be ``transplanted'' from sequential languages, yielding useful tools for static detection of common errors. More surprisingly, these type systems also give rise to new techniques for reasoning about concurrent programs. My current interests concern the role of parametric polymorphism in modular reasoning about concurrent systems. By capturing formally the intuition that a generic function or server process must behave uniformly in the type of the data it manipulates, we should be able to reduce significantly the number of states that must be considered when reasoning about a concurrent system. The most important application area for such improvements is in model checking, where the size of state spaces is a crucial concern. In the longer term, I am interested in type systems capable of ensuring various forms of noninterference between concurrent activities. The ultimate aim is to assemble these fragments into a rich type discipline and an associated ``logical toolbox'' of more general manual and semi-automatic reasoning techniques.

An underlying theme in my work is the interweaving of theoretical modeling and pragmatic experimentation. I have used a variety of tiny programming calculi for this purpose---in particular, Church's lambda-calculus forms a framework for type systems of sequential languages, while Milner's pi-calculus plays a similar role in my work on concurrent languages. The value of these calculi as research tools lies in the fact that they are simultaneously media in which to experiment with useful programming idioms and styles, intermediate languages for compilers, and tractable mathematical objects whose properties can be investigated rigorously.

An important experimental vehicle for my research has been the language Pict, conceived in 1993 (with Turner) as a testbed for statically typed concurrent programming using objects. Pict combines a clean operational semantics based directly on the pi-calculus with an ambitious type system extending my earlier theoretical work on type systems for objects and classes. The pi-calculus also serves as the intermediate language for an optimizing compiler. Current work on Pict centers on its use as the core of a family of experimental languages ("nomadic pict") supporting various forms of code mobility---transmission of computational agents between the nodes on a distributed system or across the internet. Basing these experiments on a language with both a firm theoretical grounding and a full-scale implementation will enable a systematic and pragmatically grounded exploration of some of the huge space of ``languages for programming the world wide web,'' ranging from the simple client-server architectures typical of present-day web languages like Java to much richer programming models incorporating migration of running agents, support for fault tolerance, and flexible security policies. A number of interesting typing issues arise in this context---for example, the problem of checking static consistency of interfaces in a long-lived, distributed, and persistent environment.

My recent move to Penn has led to a number of opportunities for collaborative projects in novel areas for me. In one recent line of work, Peter Buneman and I have defined a novel type analysis for ``semistructured database query languages'' based on an unusual notion of untagged union types. In another, Insup Lee, Trevor Jim, and I are working on refining and generalizing a file synchronization system that has been a side project of my own for the past two years. We have the system running now on both Windows and Unix (it is the first such tool capable of handling both platforms), and are considering how to extend it to the domain of synchronizing replicated information on the Web.

Another recent project, dubbed TinkerType, concerns modular presentation of collections of type systems. The goal is to construct a ``map'' of a large variety of existing type systems, showing their family relations and the ways in which some can be derived by incremental modification -- ``inheritance'' -- from others. Over the past few months, we have developed a formal language for describing fragments of type systems and built a tool for assembling type systems (both TeX descriptions of inference rules and running typecheckers) from their constituent fragments. The first product of this project will be a graduate-level textbook on type systems. In the longer term, our ambition is to construct not just type systems but also proofs of standard properties in an incremental manner.