Technical Report Abstracts



A Formal Framework for Linguistic Annotation

Steven Bird and Mark Liberman

`Linguistic annotation' is a term covering any transcription, translation or annotation of textual data or recorded linguistic signals. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of annotation formats and demonstrate a common conceptual core. This provides the foundation for an algebraic framework which encompasses the representation, archiving and query of linguistic annotations, while remaining consistent with many alternative file formats.


Spectral Gradient: A Surface Reflectance Measurement Invarian to Geometry and Incident Illumination

Elli Angelopoulou, Sang W. Lee and Ruzena Bajcsy

Although photometric data is a readily available dense source of information in intensity images, it is not widely used in computer vision. A major drawback is its dependence on viewpoint and incident illumination. A novel methodology is presented which extracts reflectivity information of the various materials in the scene independent of incident light and scene geometry. A scene is captured under three different narrow-band color filters and the spectral derivatives of the scene are computed. The resulting spectral derivatives form a spectral gradient at each pixel. This spectral gradient is a surface reflectance descriptor which is invariant to scene geometry and incident illumination for smooth diffuse surfaces. The invariant properties of the spectral gradients make them a particularly appealing tool in many diverse areas of computer vision such as color constancy, tracking, scene classification, material classification, stereo correspondence, even re-illumination of a scene.


Policy and Mechanism in Adaptive Protocols

Ilija Hadzic, William S. Marcus, Jonathan M. Smith

Adaptive protocols are protocols which automatically adjust their behavior to runtime phenomena such as traffic or link characteristics. For such protocols, the behavioral adjustment is accomplished with some mechanism; the decision as to how much (if any) adjustment is needed is made under control of a it policy. Design and implementation of policies has often proven more challenging than that of mechanisms.

We make three contributions in this paper. First, we develop a rule of thumb for policy/mechanism separation and lay out a general set of challenges in policy module design. Second, as an illustration, we have analyzed a Forward-Error Correcting Code (FEC) for ATM supporting TCP/IP, and analytically identified a small robust set of "tunable parameters" to delineate regions where the code should be applied. Third, we exploited the analytic results and policy/mechanism separation to implement an adaptive protocol for network errors using programmable hardware to obtain high performance.

Measurements of TCP/IP traffic over a 155 Mbps ATM link augmented with two cooperating Programmable Protocol Processing Pipelines (P4s) were made with ttcp. The new policy module improves throughput by up to 50% over an unaugmented TCP/IP in the face of increasing bit error rates, and continue operation into a range of bit error rates where an unmodified TCP practically ceases to function.


The Price of Safety in an Active Network

D. Scott Alexander, Kostas G. Anagnostakis, William A. Arbaugh, Angelos D. Keromytis, Jonathan M. Smith

Lack of security is a major threat to ``Active Networking,'' as programmability creates numerous opportunities for mischief. The point at which programmability is exposed, e.g. through the loading of code into network elements, must therefore be carefully crafted to ensure security.

This paper makes two contributions. First, it describes the implementation of a solution, the Secure Active Network Environment (SANE), which is intended to operate on an active network router. The SANE architecture provides a secure bootstrap process, which includes cryptographic certificate exchange and results in execution of a module loader for introducing new code, as well as a packet execution environment. SANE thus permits a direct comparison of security implications of active packets (such as ``capsules'') with active extensions (used for ``flows'' of packets).

The second contribution of the paper is a performance study using a combination of execution traces and end-to-end throughput measurements. The example code performs an ``active ping'' and allows us to break down costs into categories such as authentication. In our SANE implementation on 533 Mhz Alpha PCs, securing active packets effectively increases the time required to process a packet by a third. This result implies that the majority of packets must remain unauthenticated in high performance active networking solutions. We discuss some solutions which preserve security.


Secure Quality of Service Handling (SQoSH)

D. Scott Alexander, William A. Arbaugh, Angelos D. Keromytis, Steve Muir and Jonathan M. Smith

Proposals for programmable network infrastructures, such as Active Networks and Open Signaling, provide programmers with access to network resources and data structures. The motivation for providing these interfaces is accelerated introduction of new services, but exposure of the interfaces introduces many new security risks. The risks can be reduced or eliminated via appropriate restrictions on the exported interfaces, as we demonstrate in our Secure Active Network Environment (SANE). SANE restricts the actions that loaded modules (including ``capsules'') can perform by restricting the resources that can be named; this model is extended to remote invocation by means of cryptographic certificates.

We have extended SANE to support restricted control of Quality of Service in a programmable network element. The Piglet lightweight device kernel provides a ``virtual clock''-like scheduling discipline for network traffic, and exports several tuning knobs with which the clock can be adjusted. The ALIEN active loader provides safe access to these knobs to modules which operate on the network element. Thus, the SQoSH architecture is able to provide safe, secure access to network resources, while allowing these resources to be managed by end users needing customized networking services. A desirable consequence of SQoSH's integration of access control and resource control is that a large class of denial of service attacks, unaddressed solely with access control and cryptographic protocols, can now be prevented. The ALIEN active loader provides safe access to these knobs to modules which operate on the network element. Thus, the SQoSH architecture is able to provide safe, secure access to network resources, while allowing these resources to be managed by end users needing customized networking services. A desirable consequence of SQoSH's integration of access control and resource control is that a large class of denial of service attacks, unaddressed solely with access control and cryptographic protocols, can now be prevented.

We provide some performance measurements to illustrate the cost of security, and demonstrate that these costs are minor in the context of managing a multimedia stream.


FPGA Viruses

Ilija Hadzic, Sanjay Udani and Jonathan M. Smith
Programmable logic is widely used, for applications ranging from field-upgradable subsystems to advanced uses such as reconfigurable computing platforms which are modifiable at run-time. Users can thus implement algorithms which are largely executed by a general-purpose CPU, but may be selectively accelerated with special purpose hardware. In this paper, we show that programmable logic devices unfortunately open another avenue for malicious users to implement the hardware analogue of a computer virus.

We begin the paper with an outline of the general properties of FPGAs that create risks. We then explain how to exploit these risks, and demonstrate through directed experiments that they are exploitable even in the absence of detailed layout information. We prove our point by demonstrating the first known FPGA virus and its effect on the current absorbed by the device, namely that the device is destroyed. We close by outlining possible methods of defense and point out the similarities and differences between FPGA and software viruses.


Policy-Directed Certificate Retrieval

Carl A. Gunter and Trevor Jim

Any large scale security architecture that uses certificates to provide security in a distributed system will need some automated support for moving certificates around in the network. We believe that for efficiency, this automated support should be tied closely to the consumer of the certificates: the policy verifier. As a proof of concept, we have built QCM, a prototype policy language and verifier that can direct a retrieval mechanism to obtain certificates from the network. Like previous verifiers, QCM takes a policy and certificates supplied by a requester and determines whether the policy is satisfied. Unlike previous verifiers, QCM can take further action if the policy is not satisfied: QCM can examine the policy to decide what certificates might help satisfy it and obtain them from remote servers on behalf of the requester. This takes place automatically, without intervention by the requester; there is no additional burden placed on the requester or the policy writer for the retrieval service we provide. We present examples that show how our technique greatly simplifies certificate-based secure applications ranging from key distribution to ratings systems, and that QCM policies are simple to write. We describe our implementation, and illustrate the operation of the prototype.


VENUS: A Virtual Environment Network Using Satellites

Sanjay Udani, Jord Sonneveld, Jonathan Smith, David Farber

Virtual environment (VE) designs have evolved from text-based to immersive graphical systems. The next logical step of this evolution is to have a fully immersive environment in which thousands of widely distributed users will be able to move around and interact. This requires a VE architecture that can scale well for a large number of participants while providing the necessary support for quality of service, security and flexibility. Current VE architectures are unable to fully meet these requirements and a new network/protocol architecture is needed.

The VENUS approach addresses these problems by creating a network architecture which is scalable and flexible. We define a new architecture consisting of a transmit-only satellite/server and bi-directional links which will be capable of sustaining a wide-area virtual environment. We then offer the preliminary results of our experiments.


Union Types for Semistructured Data

Peter Buneman, Benjamin Pierce
Semistructured databases are treated as dynamically typed: they come equipped with no independent schema or type system to constrain the data. Query languages that are designed for semistructured data, even when used with structured data, typically ignore any type information that may be present. The consequences of this are what one would expect from using a dynamic type system with complex data: fewer guarantees on the correctness of applications. For example, a query that would cause a type error in a statically typed query language will return the empty set when applied to a semistructured representation of the same data. Much semistructured data originates in structured data. A semistructured representation is useful when one wants to add data that does not conform to the original type or when one wants to combine sources of different types. However, the deviations from the prescribed types are often minor, and we believe that a better strategy than throwing away all type information is to preserve as much of it as possible. We describe a system of untagged 'union types' that can accommodate variations in structure while still allowing a degree of static type checking. A novelty of this system is that it involves non-trivial equivalences among types, arising from a law of distributivity for records and unions: a value may be introduced with one type (e.g., a record containing a union) and used at another type (a union of records). We describe programming and query language constructs for dealing with such types, prove the soundness of the type system, and develop algorithms for subtyping and typechecking.


Behavioral Equivalence in the Polymorphic Pi-Calculus*

Benjamin Pierce, Davide Sangiorgi


Updates and Non-Determinism in Object-Oriented Databases

Hartmut Liefke and Susan B. Davidson

While the topic of object-oriented databases is by now fairly mature -- a standard for data definition and query languages has been developed -- the question of updates has largely been ignored. Updates on object-oriented databases are relegated to methods, which are written in third-generation languages such as C++ and therefore appear as ``black boxes'' to the query processor. This poses two problems: First, it is difficult to reason about whether an update is non-deterministic. Non-determinism arises due to iteration over sets and potential sharing between elements of the set. Second, these updates are system-specific and cannot be optimized. We propose an optimizable language for updating object-oriented databases called OQL+. The look and feel of OQL+ is similar to OQL, with the addition of insert, update and delete primitives. OQL+ is expressive enough to capture most updates that users might want to perform on an object-oriented database while maintaining the simplicity of extensions for updates as in SQL. We then discuss the problem of non-deterministic updates in the language, and give a proof technique for detecting potential non-determinism.


Three-dimensional Motion Reconstruction and Analysis of the Right Ventricle Using Tagged MRI

Idith Haber, Dimitris N. Metaxas and Leon Axel

Right ventricular (RV) dysfunction can serve as an indicator of heart and lung disease and can adversely affect the left venticle (LV). However, normal RV function must be characterized before abnormal states can be detected. We can describe a method for reconstructing the 3D motion of the RV images by fitting of a deformable model to extracted tag and contour data from multiview tagged magnetic resonance images(MRI). The deformable model is a biventricular finite element mesh built directly from the contours. Our approach accommodates the geometrically complex RV by using the entire lengths of the tags, localized degrees of freedom (DOFs), and finite elements for geometric modeling. We convert the results of the reconstruction into potentially useful motion variables, such as strains and displacements. The fitting technique is applied to synthetic data, two normal hearts, and a heart with right ventricular hypertrophy (RVH). The results in this paper are limited to the RV free wall and septum. We find noticeable differences between the motion variables calculated for the normal volunteers and the RVH patient.


Methods for Modeling and Predicting Mechanical Deformations of the Breast during Interventional Procedures

Fred S. Azar, Dimitri N. Metaxas, Reid T. Miller, Mitchell D. Schnall

When doing high field (1.5T) magnetic resonance breast imaging, the use of a compression plate during imaging after a contrast-agent injection may critically change the enhancement characteristics of the tumor, making the tracking of its boundaries very difficult. A new method for clinical breast biopsy is presented, based on a deformable finite element model of the breast. The geometry of the model is constructed from MR data, and its mechanical properties are based on a non-linear material model. This method allows imaging the breast without compression before the procedure, then compressing the breast and using the finite element model to predict the tumor's position. The axial breast contours and the segmented slices are ported to a custom-written MR-image contour analysis program, which generates a finite element model (FEM) input file readable by a commercial FEM software. A deformable silicone gel phantom was built to study the movement of an inclusion inside a deformable environment. The hyperelastic properties of the phantom materials were evaluated on an Instron Model 1331 mechanical testing machine. The phantom was placed in a custom-built pressure device, where a pressure plate caused a 14% (9.8mm) compression. The phantom was imaged in a 1.5T magnet (axial and coronal), in the undeformed and deformed states. An FEM of the phantom was built using the custom-written software from the MR data, and another FEM of the phantom was built using a commercial pre-processor from the phantom's directly measured dimensions. The displacements of the inclusion center and its boundaries were calculated, both from the experimental and FEM results. The calculated displacements from both models are within 0.5mm of each other, and agree within 1.0mm with the experimental results. This difference is within the imaging error.


A Secure Plan

Michael Hicks and Angelos D. Keromytis

Active Networks promise greater flexibility than current networks, but threaten safety and security by virtue of their programmability. In this paper, we describe the design and implementation of a security architecture for the active network PLANet. Security is obtained with a two-level architecture that combines a functionally restricted packet language, PLAN, with an environment of general-purpose service routines governed by trust management. In particular, we employ a technique which expands or contracts a packet's service environment based on its level of privilege, termed namespace-based security. As an application of our security architecture, we present the design and implementation of an active-network firewall. We find that the addition of the firewall imposes an approximately 34% latency overhead and as little as a 6.7% space overhead to incoming packets.


Chunks in PLAN: Language Support for Programs as Packets

Jonathan T. Moore, Michael Hicks, and Scott M. Nettles

Chunks are a programming construct in PLAN, the Packet Language for Active Networks, comprised of a code segment and a suspended function call. In PLAN, chunks provide support for encapsulation and other packet programming techniques. This paper begins by explaining the semantics and implementation of chunks. We proceed, using several PLAN source code examples, to demonstrate the usefulness of chunks for micro-protocols, asynchronous adaptation, and as units of authentication granularity.


Laser Scanner Technology

Elli Angelopoulou and John R. Wright, Jr.

This paper addresses the basic principles, performance measures and applications associated with laser scanner technologies. The objective of this report is to communicate and disseminate pertinent information related to state-of-the-art laser measurement systems that are currently available through commercial and research means. This paper should serve two-fold: (1) as a basic tutorial to laser scanning technology and (2) as a guide to current manufacturers and researchers of this technology.


How Good is Local Type Inference?

Haruo Hosoya and Benjamin Pierce

A partial type inference technique should come with a simple and precise specification, so that users predict its behavior and understand the error messages it produces. Local type inference techniques attain this simplicity by inferring missing type information only from the types of adjacent syntax nodes, without using global mechanisms such as unification variables. The paper reports on our experience with programming in a full-featured programming language including higher-order polymorphism, subtyping, parametric datatypes, and local type inference. On the positive side, our experiments on several nontrivial examples confirm previous hopes for the practicality of the type inference method. On the negative side, some proposed extensions mitigating known expressiveness problems turn out to be unsatisfactory on close examination.


Reconstructing Articulated Objects from Point Correspondences in a Single Uncalibrated Image

Camillo J. Taylor

This paper investigates the problem of recovering information about the configuration of an articulated object, such as a human figure, from point correspondences in a single image. Unlike previous approaches, the proposed reconstruction method does not assume that the imagery was acquired with a calibrated camera. An analysis is presented which demonstrates that there are a family of solutions to this reconstruction problem parameterized by a single variable. A simple and effective algorithm is proposed for recovering the entire set of solutions by considering the foreshortening of the segments of the model in the image. Results obtained by applying this algorithm to real images are presented.


TinkerType: A Language for Playing with Formal Systems

Michael Y. Levin and Benjamin C. Pierce

TinkerType is a framework for compact and modular description of formal systems (type systems, operational semantics, logics, etc.). A family of related systems is broken down into a set of CLAUSES---individual inference rules---and a set of FEATURES controlling the inclusion of clauses in particular systems. Dependency relations on both clauses and features and a simple form of ``judgement signatures'' are used to check the consistency of the generated systems.

As an application, we present a substantial repository of typed lambda-calculi, including systems with subtyping, polymorphism, type operators and kinding, computational effects, and dependent and recursive types. The repository describes both declarative and algorithmic aspects of the systems, and can be used with our tool, the TinkerType Assembler, to generate calculi either in the form of typeset collections of inference rules or as executable ML typecheckers.


Extraction Of Web Information Using W4F Wrapper Factory and XML-QL Query Language

Deepali Bhandari

In many ways, the Web has become the largest knowledge base known to us. The problem facing the user now is not that the information he seeks is not available, but that it is not easy for him to extract exactly what he needs from what is available. It is also becoming clear that a top down approach of gathering all the information, and structuring it will not work, except in some special cases. Indeed, most of the information is present in HTML documents structured only for visual content. Instead, new tools are being developed that attack this problem from a different angle. XML is a language that allows the publisher of the data to structure it using markup tags. These mark-up tags clarify not only the visual structure of the document, but also the semantic structure. Additionally, one can make use of a query language XML-QL to query XML pages for information, and to merge information from disparate XML sources. However, most of the content of the web is published in HTML. The W4F system allows us to construct wrappers that retrieve web pages, extract desired information using the HTML structure and regular expression search and map it automatically to XML with its XML-Gateway feature. In this thesis, we investigate the W4F/XML-QL paradigm to query the web. Two examples are presented. The first is the Internet Movie Database, and we query it with the idea of understanding the power of these systems. The second is the NCBI BLAST server, which is a suite of programs for biomolecular sequence analysis. We demonstrate that there are real life instances where this paradigm promises to be extremely useful.


Visual Servoing with Dynamics: Control of an Unmanned Blimp

Hong Zhang, James P. Ostrowski
Image-based control of free-flying mechanical systems is addressed, with application to an indoor, vision-guided blimp. A methodology is developed for performing visual servoing by incorporating the physical parameters of a mechanical system into the image plane dynamics. It is noted that under suitable conditions, namely the existence of a diffeomorphism between image features and the robot pose, many of the tools from mechanical systems control theory can be used directly when applied in the image plane. It is also shown that when a suitable Jacobian map exists between the pose and image feature spaces, dynamics for vision-based control of aerial and underwater vehicles can easily be formulated. Experimental results demonstrate initial success at spatial tracking of simple objects.


Steering For A Class Of Dynamic Nonholonomic Systems

Jim Ostrowski


On Inner Classes

Atsushi Igarashi and Benjamin C. Pierce

Inner classes in object-oriented languages play a role similar to nested function definitions in functional languages, allowing an object to export other objects with direct access to its own methods and instance variables. However, the similarity is deceptive: a close look at inner classes reveals significant subtleties arising from their interactions with inheritance.

The goal of this work is a precise understanding of the essential features of inner classes; our object of study is a fragment of Java with inner classes and inheritance (and almost nothing else). We begin by giving a direct reduction semantics for this language. We then give an alternative semantics by translation into a yet smaller language with only top-level classes, closely following Java's Inner Classes Specification. We prove that the two semantics coincide, in the sense that translation commutes with reduction, and that both are type-safe.


Safe and Efficient Active Packets

Jonathan T. Moore

We present a new scheme for _active_, or programmable, packets based upon a new packet language, SNAP (Safe Networking with Active Packets). SNAP's semantics permit us to prove that all SNAP programs are safe with respect to network resource usage and evaluation isolation. Furthermore, we describe an implementation of a SNAP interpreter, _snapd_, which achieves high performance for standard networking tasks. This work represents the first active packet system that is demonstrated to be _both_ safe and efficient.


Featherweight Java: A Minimal Core Calculus for Java and GJ

Atsushi Igarashi, Benjamin Pierce, and Philip Wadler

Several recent studies have introduced lightweight versions of Java: reduced languages in which complex features like threads and reflection are dropped to enable rigorous arguments about key properties such as type safety. We carry this process a step further, omitting almost all features of the full language (including interfaces and even assignment) to obtain a small calculus, Featherweight Java, for which rigorous proofs are not only possible but easy.

Featherweight Java bears a similar relation to full Java as the lambda-calculus does to languages such as ML and Haskell. It offers a similar computational ``feel,'' providing classes, methods, fields, inheritance, and dynamic typecasts, with a semantics closely following Java's. A proof of type safety for Featherweight Java thus illustrates many of the interesting features of a safety proof for the full language, while remaining pleasingly compact. The syntax, type rules, and operational semantics of Featherweight Java fit on one page, making it easier to understand the consequences of extensions and variations.

As an illustration of its utility in this regard, we extend Featherweight Java with GENERIC CLASSES in the style of GJ (Bracha, Odersky, Stoutamire, and Wadler) and give a detailed proof of type safety. The extended system formalizes for the first time some of the key features of GJ.


XMill: an Efficient Compressor for XML Data

Hartmut Liefke and Dan Suciu

We describe a tool for compressing XML data, called XMill, that usually achieves about twice the compression ratio of gzip at roughly the same speed. The intended applications are XML data exchange and archiving. XMill does not need schema information (such as a DTD or an XML-Schema), but can exploit hints about such a schema in order to further improve the compression ratio. XMill incorporates and combines existing compressors in order to compress heterogeneous XML data: it uses zlib, the library function for gzip, as well as a collection of datatype specific compressors. XMill can be extended with new specialized compressors: this is useful in applications managing XML data with highly specialized data types, such DNA sequences, images, etc. The paper presents a theoretical justification for the method used, XMill architecture and implementation, a new languages for expression the hints about the XML schema, and a series of experiments validating XMill on several real data sets.


Efficient View Maintenance in XML Data Warehouses

Hartmut Liefke and Susan Davidson

XML has become an important standard for the representation and the exchange of data over the Internet. While several key aspects of XML have been investigated, such as query languages, type systems, and storage models, the issue of incrementally maintaining XML views is largely unstudied. XML views differ from relational views in two essential ways: 1) There is no rigid type system, and 2) The query definition often performs complex restructuring far beyond the typical select-project-join query definition in relational views.

In this paper, we address the problem of incrementally maintaining views over XML data with key constraints. We describe a system called WHAX (Warehouse Architecture for XML) that allows the definition and incremental maintenance of views over existing relational and XML data sources with keys. Our query language supports important operations, such as joins, aggregations, regrouping, and restructuring operations such as flattening. We generalize several well-known results about view maintenance in the relational model based on the notion of "multi-linearity".


A Chase Too Far?

Lucian Popa, Alin Deutsch, Arnaud Sahuguet, Val Tannen

In a previous paper we proposed a novel method for generating alternative query plans that uses chasing (and back-chasing) with logical constraints. The method brings together use of indexes, use of materialized views, semantic optimization and join elimination (minimization). Each of these techniques is known sepa- rately to be bene cial to query optimization. The novelty of our approach is in allowing these techniques to interact systematically, eg. non-trivial use of indexes and materialized views may be enabled only by semantic constraints. We have implemented our method for a variety of schemas and queries. We examine how far we can push the method in term of complexity of both schemas and queries. We propose a technique for reducing the size of the search space by "stratifying" the sets of constraints used in the (back)chase. The experimental results demonstrate that our method is practical (i.e., feasible and worthwhile).


The Reflectance Spectrum of Human Skin

Elli Angelopoulou
The automated detection of human skin has become an important task for machine vision systems. Prior vision and graphics research on this topic has primarily focused on images acquired with conventional color cameras. Formal spectrographic analyses have largely been undertaken in the medical domain and focus on the detection of pathology. We have constructed a facility capable of producing accurate, dense and repeatable reflectance measure-ments. The light reflected from the skin is measured using a high resolution, high accuracy spectrograph under precisely calibrated lighting conditions. This paper presents observations fromthe first body of data gathered at this facility. From the measurements collected thus far, we have observed population-independent factors of skin reflectance. We show how these factors can be exploited in skin recognition. Finally, we provide a biological explanation for the existence of a distinguishing pattern in human skin reflectance.
Send comments/questions to