CIT 591 Assignment 9: More Text Translator Notes
Fall 2008, David Matuszek

Text files

There is a difference between plain text files, which contain only ASCII (or occasionally Unicode) characters, and formatted text files, which contain various codes to indicate boldface, italic, fonts, etc. Your text translator is expected to work only for plain ASCII text files.

Plain text files often have the extension .txt. Java files, ending in .java, are also plain text. The extensions .doc and .rtf indicate two different kinds of formatted files.

You cannot change the content of files merely by changing their extension.

Windows protects naive users from having to know about file extensions. As a programmer, you will find it helpful to see those extensions. In Windows XP, in a Windows Explorer view, you can go to Tools -> Folder options -> View and uncheck Hide extensions for known file types. It is probably someplace similar in Vista.


JUnit testing

You only need to test the four translate methods (the ones in the four classes that implement TranslatorInterface).

The getName and getDescription methods are almost too simple to bother with, and the methods in the Translator class are all GUI handling, which is very difficult to JUnit test.


Gremlin characters

For testing the Zap Gremlins translator, you can easily create a String containing "gremlin" characters.

Octal: In a quoted string you can write \d, \dd,or \ddd, where the ds are octal digits. For example, "abc\07xyz". Octal numbers from 0 to 377 are valid.

Hexadecimal: In a quoted string you can write \uDDDD, where the Ds are exactly four hexadecimal digits. For example, "abc\u007Fxyz".


Line wrapping

The assignment asks you to break lines at a space. You do not have to worry about breaking lines at a tab character (\t).

Don't forget the requirement that you have to maintain the indentation of the original line. If the line begins with five spaces, you should have five spaces at the beginning of the continuation line. Also don't forget that lines may be indented with tabs as well as with spaces.


Identity

There are three ways of indicating a "new line". Macintosh OS 9 and earlier used the carriage return character, \015; Mac OS X and other Unix systems use the linefeed character, \012; and Windows systems use the two-character sequence carriage return/linefeed, \015\012. You don't have to worry about these differences if you are using a decent text editor to view your files.

A program that reads and writes lines may, unless you are very careful, change the type of newline used. Microsoft's Notepad text editor is incapable of recognizing other types of newlines. Consequently, Notepad may show the result of your "Identity" transformation as a single line of output. Technically, this is a fault in your Java program, but I don't think it's worth the bother of fixing it (I seldom use Notepad).

If you are still using Notepad, consider getting a "real" text editor. All are easy to use, and all have a variety of useful features. I recommend one of the free programs Notepad++ for Windows, TextWrangler or Smultron for Macintosh, or jEdit for any system.


Fixing indentation

I did not give a very clear idea of what this is supposed to do. The intent is to repair indentation the way the Source -> Correct Indentation or Ctrl-I command works in Java.

Here is an example of correct indentation:

if (x < y) {
    x = 0; { y = 0; }
} else { 
    x = y;
}

When you select lines and correct indentation in Eclipse, you can select any lines; you do not have to make sure you select lines that have a balanced set of opening and closing braces. Your program should not assume this, either.

All indentation should done with spaces, not with tabs. You can assume the input has no tabs. Tab stops can be anywhere, depending on the program, so there is literally no correct way to do indentation with tabs.