Self-Checking Active Networks (SCAN)

Jonathan M. Smith, David J. Farber, Carl A. Gunter, Sampath Kannan, Insup Lee and Scott M. Nettles

EXECUTIVE SUMMARY

DARPA-sponsored network security research in the 1970s led to the Overseer[3] idea, where network packets were checked for correctness with respect to a protocol graph. Active Networks make this feasible.

We propose to use Active Network technology to construct survivable network infrastructures based on the idea of ``self-checking'' systems. Programmable network infrastructure provides the opportunity to add network functionality ``on-the-fly''. Among the functions that can be added are modules that detect intrusions, link failures, and failures to meet quality-of-service guarantees. Building verifiable systems requires formal methods, but large-scale operation requires distributed and largely autonomous algorithms, which are presently out of reach of most formal design methods. Our research will address this problem in the context of Active Networks.

Our work takes the following approach. As suggested by early work of Cerf, Farber, and Postel, protocols are represented as graphs that show the protocol's flow of control and message emissions. Distributed systems use such protocols to operate over the network. Traversing paths through the protocol graph results in sequences of valid states, corresponding to valid sequences of messages. These message sequences are examined and validated by the programmable network infrastructure. If an invalid sequence is detected, the infrastructure performs an exceptional action, for example, raising an alarm, or blocking further messages. Different portions of the network observe different sequences of messages. Thus it is natural to design a highly distributed monitor. Such a design has the added advantage of being efficient and scalable.

We propose to use a formal language for high-level protocol specification and to perform static analysis on the protocol to verify that it meets the specification. A central idea in the proposed work is to derive checking criteria from the verified protocol. The graph of this protocol will provide a correct validation model for the message flow at each point in the network. These models can be converted to Active Network programs that behave as a sophisticated packet filter under the direction of the original protocol. This checking of state traversals can be applied broadly in an Active Net. For network security, unusual and unexpected message sequences can be used to detect large classes of intrusions. For real-time systems, timing information from a real-time program can be distilled into a time-checking algorithm for message interarrivals. Many program properties that result in message exchange in a distributed system can be ``self-checked'' by a Self-Checking Active Network (SCAN). An important property of the graph-derived self-checks is the increasing robustness of the self-checking scheme with increases in either the network scale or the complexity of the system being self-checked.

In the DARPA -funded SwitchWare project, Penn and Bellcore are investigating the use of formal methods and modern, strongly typed programming languages to build a robust Active Network node. In the work proposed here, SwitchWare nodes will serve as a programmable network infrastructure which can be loaded with monitors using the protocol graph to determine whether a message sequence is valid. We will evaluate our ideas experimentally by building a small prototype programming system that would translate protocols into packet sequence parsers, which we would inject into a SwitchWare node. Various attacks on the network infrastructure will be performed to test the strength of the relation between message sequences and correct system behavior.

1.0 Introduction: The Basic Approach

Our basic approach is to use high-level specifications to generate Active Network programs that can be used to self-check the operation of the network. Typically, these specifications will be protocol graphs, a common and powerful formal representation of protocols. The programs will run on the nodes of an active network and will monitor message flow for conformance to the protocols of interest.

Graphs have been used to represent programs and protocols since the earliest computing systems. As an example, consider the modules A, B, and C constituting a protocol, and a derived graph:

	A:					  A
         if( conditionA ) then B else C;	 / \
        B:					V   V
         if( conditionB ) then C else STOP;	B<->C
        C: B

If each arc traversal generates a message, then we can test sequences of messages to see if they represent valid state sequences. For example, if we see {A->C, C->B}, we know the system is following a valid path through the graph. On the other hand, if we see {A->B, C->B}, then some failure has occurred, whether it be the loss of a packet resulting in a missing transition, or an invalid transition.

An important property of the path traversal<->message sequence model is that the number of paths in a real protocol is vastly fewer than what is possible for an arbitrary set of modules. Whereas complete static analysis of the protocol will require examining all possible paths between these modules (a number that grows exponentially in the lengths of the paths considered), checking that the protocol's behavior is correct only involves the examination of a number of paths that is proportional to the number of messages sent in the protocol. Furthermore, the requisite checking can be performed in a distributed fashion leading to even greater efficiency.

If these protocol graphs represent the behavior of a networked system, the messages can be observed as they traverse the network, offering the possibility that the network itself can help verify proper operation of a distributed system. Furthermore, if the information available to the sequence-checking algorithm includes timing, then message sequences can be analyzed for correct functioning as well as quality of service guarantees.

Active Networks[1] offer a particularly flexible mechanism for deploying such self-checking systems, since they allow arbitrary programs to execute on the network infrastructure and will also allow such checking to be dynamically deployed. We call a network employing this technology a Self-Checking Active Network (SCAN). SCANs can be considered a distributed analogue of watchdogs[2] and some of the key ideas were sketched out (without the availability of Active Networks) by Pickens and Farber in their Overseer[3] scheme.

2.0 Specific Applications

2.1 Security

SCANs provide the ability to protect the network against many active attacks. For example, Keromytis and Smith[5] have developed a method by which arbitrary cryptographic protocols can be made fail-stop. Rather than release information, fail-stop cryptographic protocols terminate when an active attack is detected. The method uses cryptographic hashes to validate sequences of messages by reflecting message dependencies in the hash values. The paper demonstrates the technique on a version of Netscape's Secure Socket Layer (SSL) and suggests some applications to IPSEC.

This method is easily deployable on SCANs, as it uses a similar model for validity (valid sequences) and checking is easily performed at multiple points in the network, all of which can ``fail-stop'' when attacks are detected. This approach simplifies the task of cryptographic protocol design, as it allows the designer to focus on passive attacks and malicious insiders.

2.2 Real-Time Behavior and Property Specifications

Checking requires that the behavior or properties one desires to ensure at run-time must be specified. Two kinds of specifications are needed. The first are behavioral specifications (possibly graphs) that describe the operational view of network components and their interactions. The second kind describe the required safety and real-time properties.

The behavior of a network and its protocol are naturally described as a set of communicating processes. Real-time process algebras are formalisms commonly used for this purpose. A good example is the Algebra of Communicating Shared Resources (ACSR) that we have been developing for the specification and analysis of concurrent and real-time systems [7]. ACSR supports processes that execute synchronous timed actions and asynchronous instantaneous events. Timed actions are used to represent the usage of resources and to model the passage of time. Events are used to capture synchronization between processes. To specify real systems accurately, ACSR supports the notion of priorities as well as the notion of exceptions and interrupts. We propose to extend these techniques and ACSR in particular to complex, adaptive systems such as Active Networks.

It is impossible to effectively generate self-checking monitors that verify end-to-end QoS or detect a security violation without the ability to specify high-level non-behavioral requirements. We are adding a property specification language to the ACSR framework to express such requirements. Our main design goal is to provide an intuitive way to express commonly encountered properties. The language will allow users to perform static analysis of ACSR specifications using model checking techniques. Furthermore, these property specifications will be used to generate run-time checkers based on the program checking paradigm.

2.3 Large-Scale Systems

With an Internet consisting of more than 10**7 nodes, distributed control algorithms are required for policy to be orchestrated across the network as a whole. For example, in IP each router participates in a distributed solution to the all pairs shortest-path problem, to which an approximation is constantly maintained by each router or higher in a routing hierarchy. This allows dynamic routing, as the updates in the case of failures allow a gradual network-wide effect of rerouting around the failed area.

SCANs can be highly distributed, operating in as many nodes as there are participants in the distributed system and they can carry out computation locally which leads to a global behavior. For example, consider a denial-of-service attack, such as that used against a commercial ISP several months ago, based on repeated attempts to open a TCP connection without subsequent action. A SCAN could specify TCP using state graph in which an open must be followed by further actions. When intermediate SCAN nodes see an invalid state sequence (OPEN, OPEN,...) they can discard subsequent packets. This results in "pushing" the attack back to its entry point into the network, and avoiding wasting the resources of many intermediate nodes. The bigger a network is, the more distributed algorithms such as SCANs pay off.

3.0 Required Research

Conventional networks assume that attacks come from outside the network; the software making up the network infrastructure is trusted. Active networks can no longer make this assumption. Attacks from outside the network inherently require dynamic checking, but some attacks from software down-loaded into the network can be prevented statically. SwitchWare already employees a mixture of static and dynamic error/attack prevention techniques, in which static type checking is combined with dynamic techniques such as garbage collection and array bounds checking. With SCAN, we plan to extend this approach from the individual nodes of the network to the network as a whole.

To achieve this goal, we will pursue the following research issues:

1) the theory of a process algebra for performance and security specification and verification;

2) a model for software checking, self-testing, and self-correction;

3) algorithms for the generation of checkers for components on specific inputs;

4) case studies for empirical evaluation experience of the approach;

5) the impact of these models on assurance and performance;

6) determining an exception-handling model for self-checking nodes;

7) application of the ideas of program checking[6] to graph representations of protocols;

8) how to cope with phenomena that are simple, subtle, and deadly to the network, such as packet ``chain reactions'' where receipt of a packet causes multiple packets to be generated, which then cause further multiple generation. It would be valuable to develop a graph-based protocol model that could guarantee that the protocol will result in a stable network infrastructure.

A major research contribution is expected to be a new understanding of the tradeoffs between static analysis, testing, and run-time checks. Program verification and type-checking are extremely attractive, since the checks are done once, at verification or compile time, and not during normal system operation. However, such schemes are limited in their applicability by two factors. First, not all system properties are verifiable; some may be simply undecidable. Second, the technology is not yet mature enough to verify very large systems, which the Internet certainly is. At the scale of the Internet, testing is effectively out of the question due to the impossibility of testing such a complex system completely, not to mention our inability to test code down-loaded by users.

Recently a new paradigm called program checking has been proposed in [6]. The key idea in program checking is to shift the focus from verifying that a program or a protocol is correct to simply ascertaining that the behavior of a program or a protocol is correct. Note that the program checking approach still requires proofs of the correctness of the system behavior. Thus we can be as confident about system behavior that is certified by a program checker as we can be about systems that have been verified. Program checking can be done more easily than verification for a number of complex systems. In fact, it is possible to extend the idea in some cases and design self-correctors that can correct occasional, aberrant behavior of a mostly-correct program.

Program checking and self-correction have so far been applied to programs that compute mathematically well-defined functions. We propose to extend these paradigms to complex, adaptive systems such as Active Networks.

4.0 Integration with SwitchWare

SwitchWare [4] nodes will be available locally and we will use them as the infrastructure for this research. We believe that our models and techniques are sufficiently general that it will be straightforward to adapt them to other active infrastructures.

Our research in SCANs has the added advantage that it focuses SwitchWare development activities on SCANs, thus providing a concrete non-trivial application for our infrastructure development. For example, the needs of SCANs for a variety of storage, both persistent and volatile, will help to define the storage models supported by SwitchWare . Our interest in using SCANs for real-time applications puts a much higher priority on developing hard real-time garbage collection technology for SwitchWare .

5.0 DARPA relevance

Military networks are perhaps the most complex networks to manage, reflecting the diversity of operating environments encountered in DoD activities. Furthermore, they have security requirements that far exceed those of civilian systems. While the DARPA-created Internet technology is remarkably flexible, the use of a packet format as an interoperability layer slows progress in introducing new services[4], which is a strong motivation for the Active Networks notion.

In the Penn/Bellcore SwitchWare project[4], we are focusing on node-oriented bases for Active Network security, including the use of formal methods, theorem provers, strongly type-checked languages, and encrypted wire-languages as a methodology to produce Active Network nodes with tunable levels of security. These formal methods are being developed concurrently with an experimental component. We have an experimental ``Active Bridge'' operating at Penn, consisting of a CAML (a dialect of ML) bridging program operating on a 4-processor 166Mhz Pentium machine that interconnects several 100 Mbps Ethernets. The Active Bridge is currently operating diverse Spanning Tree Protocols based on the packet type identifier (DEC or IEEE 802.3) found in the packet header, allowing interoperation between diversely bridged Ethernet infrastructures.

As we note below, our work is similar to others in Active Networks in that we are focused on the nodes and their programming. DARPA's research efforts have to set a target beyond the success of one or more of the node research efforts to the point where the node solutions comprise a network. Our Self-Checking Active Networks ideas are applicable across any programmable infrastructure, whether it be MIT's Active IP option , BBN's Smart Packets, or Arizona's Liquid Software . The important thing from a DARPA perspective is that it provides a possible path for moving from a basic Active Network technology to Secure, Survivable and Scalable distributed systems implemented using Active Net technology.

6.0 Comparison with other ongoing research

David Tennenhouses's team at MIT is investigating a number of issues in Active Nets, but much of the focus of the early work appears to be the design of nodes (e.g., support for the Active IP option) and a small programming environment based on TCL.

Peterson's Liquid Software effort at Arizona is a takeoff on the Scout project, which is using compiler technology as an aid to building efficient schedulable groups of packet processing functions in units called "Paths." This technology could be used to optimize implementations of SCANs, or possibly to generate SCANs from the analysis of a protocol graph. In turn, it could be enhanced by the SCAN functionality, as Path behavior could be self-checked by the active network nodes comprising the path.

Partridge's Smart Packets effort at BBN is focused on the efficient construction of the programming system for active networks. Smart Packets, like the capsules proposed in the MIT design, contain code. Partridge's initial effort seems targeted at efficient byte code or even machine code in the packet, with the intent of ensuring that active technologies will be viable in some form on high-performance networks.

Yemini's Netscript work at Columbia provides an early model for network programming, but appears to be less focused on the construction of active network nodes than the MIT, Arizona and BBN efforts. Nonetheless, as with any programmable infrastructure, it could be employed for SCANs.

Each of these initial efforts, as our own SwitchWare effort did, proposes a programming model and a technology direction to be pursued. SwitchWare , for example, was more focused on the design of provably secure nodes than the other projects. None of the projects has (at least publicly) its focus the building of robust distributed systems USING active networks.

The SCANs we propose here can be utilized with any active network technology, although our preferred initial development environment is our own SwitchWare nodes. This is for two reasons. First, quite pragmatically, some of us (Smith, Farber, Gunter and Nettles) are developing the SwitchWare system, and thus SCANs can have considerable influence on the final structure of SwitchWare nodes. Second, it is impossible to build trusted SCANs with untrusted nodes, and we believe that the SwitchWare design is the most likely of the comparable Active Network efforts to deliver a trusted environment.

7. References

[1] D. L. Tennenhouse, J. M. Smith, W. D. Sincoskie, D. J. Wetherall, G. Minden, "A Survey of Active Network Research," IEEE Communications, January, 1997.

[2] A. Mahmood and E. J. McCluskey, "Concurrent Error Detection Using Watchdog Processors - A Survey," in IEEE Transactions on Computers, 37(2), pp. 160-174, February, 1988.

[3] J. R. Pickens and D. J. Farber, "The Overseer: A Powerful Communications Attribute for Debugging and Security in Thin-Wire Connected Control Structures," Proceedings of International Computer Communications Conference, August 1976.

[4] "SwitchWare: Accelerating Network Evolution," U. Penn CIS TR MS-CIS-96-38, 1996, http://www.cis.upenn.edu/~jms/white-paper.ps

[5] A. Keromytis and J. Smith, "Creating Efficient Fail-Stop Cryptographic Protocols," submitted to IEEE Security and Privacy, December 2, 1996.

[6] M. Blum and S. Kannan. Designing Programs that Check their Work. Journal of the ACM, 42, 1995.

[7] I. Lee, P. Bremond-Gregoire, and R. Gerber. A Process Algebraic Approach to the Specification and Analysis of Resource-Bound Real-Time Systems. Proc. of the IEEE, pages 158--171, Jan 1994.

Acknowledgements

The SwitchWare project is funded by DARPA.