| Tenth Java Assignment: Syntax Coloring CIT 591, David Matuszek, Fall 2001 |
Purposes of this assignment:
Idea of the assignment:
I post a lot of Java programs on the web. However, if I want those programs to have any syntax coloring, I have to do it all myself. This is a lot of work, and sometimes I make mistakes. It would be nice to have a program that does it for me.
Your job is to write a Java application that reads in a .java file
and does some minimal syntax coloring on it. Since doing a thorough job could
turn into a very large project, your program should do the most basic things
and, more importantly, you should think about how you can design the program
so that it can be enhanced later.
HTML
Documents for the Web are written in HTML (HyperText Markup Language). HTML
uses tags, enclosed in angle brackets, to tell your browser how to
format things. For example, <B>enclosed text</B> causes
the enclosed text to be made bold. The entire document should be enclosed in
assorted other tags, as follows:
<HTML>(your name goes here)
<HEAD>
<TITLE>Syntax coloring assignment by</TITLE>Your document (modified Java program) goes here.
</HEAD>
<BODY>
<PRE>
</PRE>
</BODY>
</HTML>
Many modern program editors use syntax coloring: keywords, strings, and comments are set off by putting them in different colors.
Here's what your program should do:
.java file.
.java in the file name with .html;
e.g. if your input file is Sample.java, your output file
should be Sample.html; orEntities
Since HTML uses the characters < (less than), >
(greater than), and & (ampersand) for its own purposes, you
must replace these in the Java program you are processing with the entities
< > and &,
respectively. Notice that the semicolon (;) is part of the entity.
The <, >, and & characters
must always be replaced by their corresponding entities, but these are
not the only characters that might be replaced by entities. Instead of writing
special code for each of these three cases, consider writing more general code
so that it is trivial to add other entities.
Syntax coloring
Various editors perform various kinds of syntax coloring. There are two sets
of options to keep in mind: (1) the kind of things that are recognized, and
(2) how they are colored.
|
Some things the program might recognize:
|
Some ways it might "color" them:
|
Just about any way you might "color" a part of the program, you can
do by wrapping HTML tags around the text. You should put these tags in your
program as named String constants (for example, START_KEYWORD,
END_KEYWORD, instead of putting them directly in the code. That
is,
is much more flexible (and therefore better style) thanString START_KEYWORD = "<B>"; String END_KEYWORD = "</B>"; ... System.out.print(START_KEYWORD + keyword + END_KEYWORD);
System.out.print("<B>" + keyword + "</B>");
Your syntax coloring assignment
Do syntax coloring for the following:
/* ... */ comments
// comments
Use a different color or style for each of these.
This program is neither long nor complicated, but it is very different from most of the programs you have written. You do not need to create a lot of new classes for this assignment--probably just one class and a couple of methods is enough.
Programming notes
A comment is not a comment if it is embedded in a quoted string. Similarly,
there are no strings inside comments. Keywords, such as for, while,
and if are not keywords if they occur in a comment or quoted string,
or as part of an identifier (such as form or knife). Singly-quoted
characters, also, must be treated specially.
These complications can best be handled by a state machine. The main states that you need to implement are:
/* ... */ comment/*, no
other symbols are treated as special except for */, which causes
a transfer to the "normal" state. Note that javadoc
comments, /** ... */, do not need to be treated differently,
although you could do so if you wanted to. Note also that /*/ is
not a complete comment (the second / does not end the comment),
but that /**/ is complete. In addition, **/ also
ends a comment. // comment//, no
other symbols are treated as special except for \n, the end-of-line
character , which causes a transfer to the "normal" state. " (double quote)
symbol, and exited (to the normal state) by another ". Inside
this state, the one character immediately following a \ (backslash)
must be ignored. This requires a bit of care: \" by itself
does not end the state, but \\" does (it's the backslash
that's quoted, not the double-quote). Similarly, \\\" does
not end the state, but \\\\" does. Finally, strings cannot
extend over more than one line, so an end-of-line character ends the string
(and is an error). ' (single quote) symbol,
and exited (to the normal state) by another '. Inside this state,
the one character immediately following a \ (backslash) must
not be treated as special.However, these are just the "main" states that your state machine
might have. For example, if you encounter a / in the "normal"
state, you might go into a "just saw a slash" state. From this state,
a second / could put you in the "// comment" state, while
a star could put you into the "/* comment" state; any other character
would return you to the "normal" state. The same sort of trick can
be used elsewhere, so that you only need to process a single character at a
time.
If your state machine does not end in the "normal" state, you should close whatever HTML tag is currently open.
How to implement a state machine
The best way to implement a state machine is with a switch statement
inside a while loop. The switch statement chooses
a block of code based on the current state, and the while loop
exits when the state machine is done.
The following example marks every sequence of digits as bold. Since the length
of the string is known beforehand, a for loop is used instead of
a while loop.
public class StateMachine {
String testString = "Testing...1...2...3974...end test 1";
StringBuffer result = new StringBuffer();
final int NORMAL = 1;
final int NUMBER = 2;
public static void main(String args[]) {
StateMachine machine = new StateMachine();
machine.run();
}
void run() {
int state = NORMAL;
for (int i = 0; i < testString.length(); i++) {
char ch = testString.charAt(i);
switch (state) {
case NORMAL:
if (Character.isDigit(ch)) {
result.append("<b>" + ch);
state = NUMBER;
}
else {
result.append(ch);
}
break;
case NUMBER:
if (!Character.isDigit(ch)) {
result.append("</b>" + ch);
state = NORMAL;
}
else {
result.append(ch);
}
break;
}
}
if (state == NUMBER) result.append("</b>");
System.out.println(testString);
System.out.println(result);
}
}
If you use a test string such as the one in the above program, remember that certain
characters must be backslashed in string literals. For example, the string
backslash is "\" must be written as "backslash is \"\\\""
.
Possible extra credit
For 20 points extra credit, make all keywords bold (including the ones we haven't talked about in class).
While it is possible to use a state machine to recognize keywords, doing so will result in a very large number of states. Here is a better solution: when you encounter a letter, go into a state that collects all the letters of the word; then check whether it is a keyword.
Due date
Please turn in your program via Blackboard by midnight, December 10.