CIT 591 Language Translation Clarifications
David Matuszek, Fall 2002

Files to read:

You should read in two files: one containing the dictionary to use, and a second containing the text passage to translate. You don't write out any files; output is to your GUI.


The GUI:

Your results are displayed in the GUI (along with your name). The only real requirement is to keep it simple enough so that we can figure out how to use your program without too much effort.


Translating multiple files:

After reading and translating a text passage, your program should be able to do it again, without quitting and restarting. This is not stated explicitly in the requirements, but is implied by the sentence that reads:

"Each time you read a new file, you should replace the old translation in the TextArea; don't just keep adding to the end."

Whether you read in a new dictionary (or the same dictionary again), or just keep using the same dictionary throughout the program execution, is up to you. It would be nice to give the user a choice, but I didn't say anything about that, so I don't require it.


Capitalization:

Correct capitalization is somewhat difficult, and not what this assignment is supposed to be about. Hence I am simplifying the problem somewhat, and asking you to follow these three rules:

You are welcome to attempt better capitalization than the above rules require.

Just to be perfectly clear, output in ALL CAPS is not acceptable.

It may help if you use a boolean flag, named something like capitalizeNextWord, to help keep track of when capitalization needs to be done. This is just a suggestion, not a requirement.

I suggest (but don't require) that you convert all words to lowercase for storage in the dictionary, and convert words to lowercase in order to look them up in the dictionary. This will avoid problems with not finding a word because it is or isn't capitalized. Conversion to lowercase is easily done with a method in the String class.

Just as an aside: You may have noticed that nouns and proper names are always capitalized in German, whereas English only capitalizes proper names. This is why I didn't simply say "retain existing capitalization," which would certainly have been simpler.


Non-English characters:

It is more difficult than I realized to write regular expressions that recognize all "word" characters, including non-English ones. Hence, the only additional characters you need to be able to recognize as "word" characters are the umlauted ones that appear in the sample test file.

You can insert these characters into your own Java program most easily by copying them from the dictionary file and pasting them. BlueJ, WordPad, and even Notepad have no trouble with these characters; they may display differently (have different glyphs) under Linux; I haven't checked, and it doesn't matter.


StringTokenizer:

Some students have discovered the very useful StringTokenizer class. You are welcome to use this class, so long as you also use regular expressions. The following requirement is quoted from the assignment:

"You should use methods both from the String class and from the java.util.regex package. Use them in whatever proportions seem most useful, but use both."

My purpose is to teach you as much Java as possible, so I certainly don't want to discourage you from learning and using anything helpful. The only (apparent) exception to this rule is that I don't want you to use an IDE that creates your GUI for you, such as Forté, JBuilder, or VisualAge--this is because it is important for you to learn how to build your own GUIs, without software assistance.


Untranslated words:

If a word doesn't get translated, maybe it isn't in the dictionary.

Almost all the words in the first paragraph are in the dictionary. Only a few of the words in the last paragraph are in the dictionary. This is intentional--it's to make it easy to test whether you handle unknown words correctly.

One of the words is the dictionary is misspelled--possibly more than one. This was not intentional, but it is not your problem. If a word isn't in the dictionary, whether deliberately or through an accidental misspelling, use the original untranslated word, with the original capitalization, in your result.


Testing:

Please do not submit test files with your program. We will use our own test files for grading. Your program is expected to work with any reasonable files, not just one particular file that you create.

It should go without saying, but your program should not crash if there are problems in the test files, such as a malformed dictionary line. Programs should never crash just because of bad input. We won't intentionally test how your program handles bad data (this time, anyway), but if your program crashes for any reason, you will definitely lose points for it.