Project 2 : Intrusion Detection Systems
CIS 551


Due: 15 March, 2007 (11:59pm EST)

Description

By now you've probably come to realize that security can be a difficult thing. You've fixed a few security flaws in project 1, and every year thousands more like it are fixed in real world software, yet fixing flaws does not prove the security of a system. Moreover, flawed software often remains in use for quite some time, even when its flaws are known and a fixed version is available. Your job for this project is to develop a network intrusion detection system that monitors network packets for exploit attempts and notifies the user of any suspicious activity.

Using Java, create a network intrusion detection system (IDS) which is capable of monitoring traffic to or from a single host on the network. You should use the jpcap library to receive packets, and regular expressions to create your rules. Since you will not have root access to the host in question, you will not monitor live network traffic; instead, you will read in a trace file of captured packets. You are still simulating a program that would process network traffic as it appears, however, and thus you should not need to process the packet trace more than once. Your program should take two command-line arguments: the first is a rule file (whose syntax is defined below), and the second is the pcap trace file, several of which will be provided for your testing. Your IDS should process each packet in the trace, and as rules are matched print an alert (which should include the name of the matched rule) to standard out. One packet may figure into any number of alerts, so your IDS should not stop processing a packet or stream when a single rule is matched.

There are two types of rules: stream rules and protocol rules. Stream rules require you to reconstruct the send or receive stream of a TCP connection, and then apply a regular expression to the entire stream. Stevens' TCP/IP Illustrated and Unix Network Programming are good references if you are unfamiliar with how TCP reconstruction works. Protocol rules specify an exchange of messages which are assumed to be in single packets (a naive assumption it turns out, but a decent first step). The sub-rules may match the flags, body, or both on each packet. Each sub-rule must be matched in order and with no intermediate packets to/from the same ports and IPs for a protocol rule to match. In particular, once you have ordered the TCP packets (which may arrive out of order) the sub-rules match the packets in TCP sequence order.

Exception: TCP protocol rules should allow any number of acknowledgments (ACKs) between the packets that match the sub-rules, as long as those ACKs carry no other data. For simplicity, you may skip such packets regardless of their sequence numbers, as long as they fall between two sub-rules of a TCP protocol rule. Note that you may still need to process these packets in other ways for other reasons depending on your implementation strategy; similarly, you should not in general ignore sequence numbers, as they are a crucial part of TCP.

For this project it is acceptable to wait until a TCP session is finished (or there are no more packets to process) before checking for matches to TCP stream and protocol rules. Extra credit will be given to projects that give warnings as soon as they are applicable, as long as the rest of the project is fully implemented. You may use tools like JLex and CUP, if you are familiar with them, to construct your rule file parser, but the grammar has been designed to be easily parsed without such tools.

Your program must work on the eniac.seas.upenn.edu machine pool.

Groups

This project is to be done in groups of two or three, with no exceptions. One member of each group should e-mail the names and e-mail addresses of everyone in their group to vaughan2 @ seas.upenn.edu by Thursday, February 22.

When submitting this project, please only do so from one group member's account. If you submit from multiple accounts you must e-mail the course staff before the submission deadline telling us which one to grade.

Deliverables

Your final submission should consist of a well commented program, a Makefile (see the discussion of makefiles below), and a README giving a brief description of each file in your directory. You should also submit a file documenting the design of your intrusion detection system, and how you evaluated its correctness (e.g. extra rulefiles). Please submit your documentation as text or PDF. Do not submit any object code (i.e. .class files), including either of the library files linked to from this page.

The Java class that contains your main method should be named IDS; you can name other classes however you'd like, but each name should reflect the purpose of its class. Your archive should be named username.tar.gz and expand into a directory that matches your username. For example, if your username is adent and you've been placing your files in ~/cis551/project2, would could run the following commands:

adent@minus:~/cis551/project2$ make clean
adent@minus:~/cis551/project2$ cd ..
adent@minus:~/cis551$ mv project2 adent
adent@minus:~/cis551$ tar czf adent.tar.gz adent

You would then submit the file adent.tar.gz.

Regular Expressions

Regular expressions are a standard tool used for pattern matching; in this project, they will be used to specify patterns either in individual packets or in reconstructed TCP streams which should trigger an alert. Java provides several classes for dealing with regular expressions; an introduction can be found in the Java API entry for the Pattern class. You will need to import the package java.util.regex in order to use the Pattern and Matcher classes. Note that, unlike the examples given in the API entry for Pattern, you may find the method Matcher.find to be more useful than Matcher.matches.

If you are having difficulty understanding regular expressions, you might try reading some of the many tutorials available on the web. Although Java handles regular expressions through objects and methods, the actual regular expression syntax is nearly identical to that used in scripting languages like Perl and Python.

Rule Syntax

A rule file consists of exactly one host entry (which must be first), and arbitrarily many rule entries. The grammar for the configuration file is:

           <file> ::= <host><rule>*

           <host> ::= host=<ip>\n\n
           <rule> ::= name=<string>\n
                      <(tcp_stream_rule>|<protocol_rule)>\n
<tcp_stream_rule> ::= type=tcp_stream\n
                      src_port=(any|<port>)\n
                      dst_port=(any|<port>)\n
                      ip=(any|<ip>)\n
                      (send|recv)=<regexp>\n
  <protocol_rule> ::= type=protocol\n
                      proto=tcp|udp\n
                      src_port=(any|<port>)\n
                      dst_port=(any|<port>)\n
                      ip=(any|<ip>)\n
                      <sub_rule>
                      <sub_rule>*
       <sub_rule> ::= (send|recv)=<regexp> (with flags=<flags>)?\n

         <string> ::= alpha-numeric string
             <ip> ::= string of form [0-255].[0-255].[0-255].[0-255]
           <port> ::= string of form [0-65535]
         <regexp> ::= Perl Regular Expression
          <flags> ::= <flag>*
           <flag> ::= S|A|F|R|P|U

Each rule begins with a name, which should be used when printing a notice every time that rule is matched by a connection or packet. "ip", "dst_port", and "recv" all refer to the remote side of a connection - these values cannot be naively matched against similarly named packet fields without first considering the direction in which the packet is traveling. (This means that for protocol rules, just as you have to switch the IP addresses when matching the "recv" subrules, you also have to switch port numbers.) The string "name" does not affect which packets are matched, but it should appear in the alert printed when a match occurs. The flags are SYN, ACK, FIN, RST, PSH, and URG, and they are present only in TCP packets. A with flags=<flags> clause in a UDP rule is erroneous, while in a TCP rule it indicates that the matched packet must have exactly those flags, and no others.

Examples

Pcap files

Samples

All pcap files provided have been sanitized. The host to protect has been given the IP address 192.168.0.1, and other IPs have been changed if needed. To see the contents of these files, you can use tcpdump -r file.pcap

Generating your own

If you are interested in generating your own trace files, we recommend you look into ethereal, a tool for examining network traffic under both Windows and most Unix-based systems, including Linux and Mac OS X. Actually capturing network traffic will require root access, so this is best done on your personal machine.

You might also be interested in reading about port scanning techniques (for example at insecure.org) to see how some of the above examples were constructed.

Java 5.0

Java 5.0 (sometimes called Java 1.5) updates the Java collection library with generics, which, from a user's point of view, are somewhat similar to C++ templates (although their technical details are quite a bit different). Generic collections can save you time and help you write better code; for example, if you want an ArrayList (a collection which combines the functionality of arrays and lists in an efficient manner) where every element is a 32-bit integer, you can declare a variable of type ArrayList<Integer>. Similarly, the type Hashtable<String, Boolean> denotes a hash table that always maps strings to booleans. This eliminates the instanceof checks and subsequent typecasts required when using these classes in previous versions of Java.

To use Java 5.0 on eniac and other SEAS machines, you must first add it to your path. Under bash, the default shell, this can be done with the command export PATH=/usr/java/jdk1.5.0/bin:$PATH - under tcsh, the command is setenv and the '=' should be replaced by a space. We will test submissions for this project using the Java 5.0 compiler, although you are not required to make use of any new features of the language. We will revert to the previous Java compiler in the unlikely event of a backwards compatibility error.

Java 5.0 has several other enhancements that you might find useful, including automatic conversion between primitive types (int) and object types (Integer), as well as an enhanced "foreach"-style for loop for dealing with collections. You can read a summary of these features here. The latest version of the Java API gives full documentation on the collection library.

Using jpcap

In order to use the jpcap library, you will need to add the file jpcap.jar to your CLASSPATH and the directory containing the file libjpcap.so to your LD_LIBRARY_PATH. If you are using bash, you can accomplish this with the export command, as in:

adent@minus:~/cis551/project2$ export CLASSPATH=~/cis551/lib/jpcap.jar:$CLASSPATH
adent@minus:~/cis551/project2$ export LD_LIBRARY_PATH=~/cis551/lib:$LD_LIBRARY_PATH

You may choose to put these files in the same directory as your project, but do not include them in the archive you submit. We will already have both files in our test environment, along with JLex and CUP.

Documentation for jpcap is available. In order to process a packet capture file, you will want to create a PacketCapture object and initialize it with the openOffline method. You can then call the addPacketListener method to enable a listener you've created - which will perform the work of your intrusion detection system - and finally call capture with a negative argument. Your program will then process packets until a CapturePacketException is thrown; you should catch this exception and terminate gracefully, as it most likely indicates the end of your capture file.

The addPacketListener method will accept any object that implements the PacketListener interface. Your listener class - which may be declared as, for example, class IDSListener implements PacketListener - must provide a method packetArrived with return type void and argument type Packet. This method will be called on each packet in the capture file (in the order in which they appear); from there you can deal with the packet in whatever way you choose.

You may also use the setFilter method to place a global filter on the packets you examine; any packets that do not match this filter will be ignored, and your packetArrived method will never be called on them. You can find the syntax of filter expressions by looking at the tcpdump man page under "expressions". Note that you do not need to use this method, as there is nothing wrong with having your PacketListener examine every packet, so if you're having difficulty constructing a filter that will match all the packets you're interested in, feel free to ignore this feature of jpcap.

Traditionally, network programmers have had to look carefully at flags and match against special constants to separate out different varieties of packets. Luckily for you, the jpcap library puts this distinction into Java's type system. The type hierarchy under Packet reflects the various different protocols used on the Internet; this allows you to use Java's instanceof operator to determine whether an IP packet (of type IPPacket) is in fact a TCP packet (of type TCPPacket). Once you have made this distinction, we highly recommend that you cast the packet to its more specific type and pass it to a method designed for that packet type. You can safely ignore any non-Ethernet packets you encounter; in fact, the current implementation of jpcap has no way of producing any such packets.

All of the classes mentioned above live in the package net.sourceforge.jpcap.capture, which you will probably want to import (import net.sourceforge.jpcap.capture.*;) at the top of any file that refers to them. Other classes in jpcap may live in other packages; details are given in the documentation.

General advice

This project will require you to write a good deal more code than was necessary for the first project. Make sure you have a plan before you begin, especially for the reconstruction of TCP streams. Feel free to create as many auxillary classes as you think are helpful, and divide your functionality into multiple methods along logical lines.

Each packet type defines new methods for giving you information on that variety of packet. Make use of these whenever possible; trying to interpret raw header data directly is much more likely to lead to subtle bugs. All of the packet types also include toString, toColoredString, and (sometimes) toColoredVerboseString methods; these should be very useful in debugging.

In order to match packet data against a regular expression, you will need to convert the packet data (a byte array) to a string. The class String has a constructor that takes a byte array, which should do what you want. If it does not seem to be behaving properly - which may be the case if you are using non-standard locale settings - there is also a constructor that takes, as a second argument, a character set name; try "ISO-8859-1" if you want to prevent any unintended processing of special characters. You may also use the constant 0 as the second argument in order to force correct behavior; this constructor is deprecated, but it is acceptable for use in this project.

When constructing TCP streams, you may, especially if you are going for the extra credit, feel constrained by Java's immutable String class. If this is the case, you might want to look at StringBuffer, which provides insertion and deletion methods not present in the immutable String. (Note, however, that it is certainly possible to take a different approach to this part of the project and have no need of this class.) You may also use StringBuilder instead of StringBuffer as long as you do not use StringBuilder objects in a multithreaded way.

There are many corner cases to consider for this project, and the details of exactly what functionality you should provide are up to some interpretation. When you encounter such a situation, make sure you explain the choice your group made in your documentation. Similarly, if you can think of some cases that your IDS does not handle - due to time constraints, for example - document them as well.

Finally, make sure you handle exceptions in a useful way. Liberal use of catch (Exception e) { ... } blocks are not good use of exceptions, especially if the block serves only to return a trivial value or print an unhelpful error message. It's best to catch only the specific exception types that you're looking for, to throw them when they are better dealt with by the calling function, and to tell the user exactly what sort of error occurred.

Makefiles

You should also include a Makefile (called Makefile) which can be used to automatically build your project. A Makefile should also be helpful while you're working; if properly set up, you will be able to rebuild any .class file by typing make filename, and make will be certain to rebuild any other .class files that are relevant and that may have changed. This lets you avoid recompiling your entire project while eliminating strange bugs caused by out of date files.

More information on Makefiles can be found in The GNU Make Manual; the following simple example should help you get started. It assumes that we have three source files, Foo.java, Bar.java, and Baz.java. The class Foo makes reference to both Bar and Baz, but neither of those classes references either the other or Foo.

# A good Makefile begins with variable declarations that capture important
# parts of the project.  In this case we have a list of class files that
# comprise the program, as well as the name and command line arguments for
# the Java compiler.

# If more classes are added to the program, we can list them here to ensure
# that they will be compiled.
CLASSES = Foo.class Bar.class Baz.class

# Putting the compiler and compiler options in variables lets us change them
# later and have these changes apply to every file.
JAVAC = javac
JAVAOPTS =

# The remainder of a makefile consists of rules of the form
#
#  target: prerequisites
# 	command
#
# If you run the command "make target", make will first check that all
# the prerequisites are present and up to date, then run the command.
# If prerequisites are missing or out of date, make will look for rules
# with those prerequisites as their targets.
#
# You MUST put a tab before the command; a sequence of spaces will not
# work.

# Some targets do not correspond to actual files; you can declare that
# they are "phony" like this.
.PHONY: all clean

# The first target in the file is selected automatically if you call
# make with no argument.  In this case it will compile all the classes
# that comprise our program.
all: $(CLASSES)

# A "clean" target allows us to quickly delete all automatically generated
# files.  This should be run right before you submit your project, and may
# be helpful any time you would like to quickly get rid of old files.  It
# uses a $(RM) variable that we did not define; this is fine, however, as
# make includes many predefined variables.
clean:
	$(RM) *.class

# This rule tells us that class Foo depends on classes Bar and Baz, and
# thus Bar and Baz should be built first.  Every class depends on its
# source file.  The special variable $< refers to the first prerequisite,
# which is generally the source file.
Foo.class: Foo.java Bar.class Baz.class
	$(JAVAC) $(JAVAOPTS) $<

# Bar and Baz have no other dependencies, and we imagine that this will be
# the case with most other source files that we add to our program.  The
# following target establishes a pattern, saying that any .class file can
# be built from its .java file.  This pattern will apply whenever we do
# not give a specific rule, so we can save those for when we have
# dependencies.
%.class: %.java
	$(JAVAC) $(JAVAOPTS) $<

Updates

Last Revised: 14 Feb. 2007