Assignment 6: Secret Codes
Fall 2007, David Matuszek

Purposes of this assignment:

General idea of the assignment:

A simple "secret code" is one in which one letter is consistently substituted for another letter. For example,
     Susie sells sea shells by the sea shore.
could be coded as
     Vxvlh vhoov vhd vkhoov eb wkh vhd vkruh.

One part of this assignment will be to create a random secret code, and use it to encode text. Another part will be to use various tricks to attempt to decode a secret message when the code is not known.

In decoding a message, it helps to know which letters are used most commonly, and which words occur most often. Much of the program will be devoted to collecting this information.

Special note: I didn't have as much time as usual to "debug" this assignment, so please keep an eye on the course web page for updates and changes.

Details:

Write JUnit tests for each of the constructors and methods specified below for the Encoder and Lexicon classes, but not for the Decoder class. Try your hand at writing the tests before writing the actual code.

We will be testing your methods with our own JUnit tests. If you don't write your code to follow the specifications, it will fail our tests, and you will lose points. (I have to keep saying this until everyone understands it.)

As always, document your classes and methods with Javadoc comments. As always, feel free to write your own additional methods; make them private unless you have some reason to think they are useful to other classes, or unless you want to write JUnit tests for them.

public class Encoder

The purposes of this class are to create and to apply a secret code. It should have:

public Encoder()
A constructor; creates a new, random secret code.

public Encoder(String code)
A constructor. The input string must be exactly 26 characters long, and must consist of all the lowercase letters, in some order.

public String encode(String plainText)
Encodes the given plaintext, according to the secret code created by or given to the constructor. For example, if the code was created by the call new Encoder("defghijklmnopqrstuvwxyzabc"), then calling encode with "Alan Alda " would yield "Dodq Dogd". Note that:
  • Lowercase letters are translated into lowercase letters, and capital letters into capital letters; and
  • All spacing and punctuation is retained.

public String[] encode(String[] plainText)
Encodes each string in the given plaintext array, returning an array of coded strings. (This method will be useful if we have a long text that is represented as an array of lines.)

public String decode(String codedText)
Just like encode(String), but applies the code in the opposite direction.

public String[] decode(String[] codedText)
Just like encode(String[]), but applies the code in the opposite direction.

public void change(char plainChar, char codedChar)
Modifies the secret code so that the code for the plain character will be the coded character. To keep the code valid, this operation changes two codes. For example, if the code previously included a->x and m->w, then calling change('a', 'w') will result not only in a->w, but also in m->x. (If change is called with two characters x and y, when it is already the case that x->y, the method will return without doing anything.)

Please ensure that the encode and decode methods work in the correct direction, as in the above examples. When encoding, letters in the plain text should be replaced by the corresponding letters in the code.

encode and decode are inverse functions. That is, for any encoder e and string s, e.decode(e.encode(s)) must be equal to s.

public class Lexicon

The purpose of the lexicon class is to keep count of how many times each encoded "word" occurs in the coded message. For example, in the above, "vxvlh" occurs once, and "vhd" occurs twice. Your lexicon class should have:

public Lexicon()
The constructor; creates and returns an empty lexicon. The lexicon should be capable of holding up to 500 distinct "words" and their counts.

public void countWord(String word)
If the word is already in the lexicon, adds 1 to its count. If the word is not in the lexicon, and the lexicon is not full, adds the word to the lexicon with a count of 1. However, if the word is not in the lexicon, and there are already the maximum number of unique words in the lexicon, the new word is just discarded without comment. (If the word isn't one of the first 500 unique words encountered, it probably isn't a very common word, and we can ignore it in our counts.)

public String[] breakIntoWords(String line)
Given a line of text (containing words, spaces, punctuation, etc.), return an array of the words in it. Consider any apostrophes to be part of a word (such as "it's" or "doesn't ).

public void countWordsInLine(String line)
Breaks the given line into words, and adds the words to the lexicon (using countWord).

public int getCountFor(String word)
If the given "word" is in the lexicon, returns its count, otherwise returns zero.

public String[] getWordsOfLength(int length)
Returns an array of all the words of a given length, where length > 0. The array is sorted so that the most common words (the words with the highest count) are at the beginning of the array, and the least common are at the end. If two words have equal counts, it doesn't matter which word comes first.

public char[] getLettersByFrequency()
Returns an array of 26 letters, sorted by their frequency of occurrence in the lexicon, so that the most frequent letter is in array location zero, and the least frequent in array location 25. For example, if the "word" vhoov occurs in the lexicon with a count of 5, that means that v and o have occurred 10 times each, and the letter h has occurred 5 times, in addition to whatever other words they occurred in.

These methods should convert input words to lowercase, so that the user doesn't have to worry about capitalization. Consequently, getWordsOfLength(int) will return an array of lowercase words.

public class Decoder

This class has the default constructor and the following methods:

public String[] attemptToDecode(String[] lines)
Given an array of lines of encoded text, attempts to decode them, returning an array of decoded lines.

public static void main(String[] args)
This will control what your program does; give the user some control. One of the things you should be able to do from the main program is to read in a file, encode it with some random code, then attempt to decode it and print the results.
 

The above method should create a Lexicon, and use the methods in the lexicon.

There is no guaranteed algorithm for writing the above method. Here are some restrictions that you can use:

Here's how I would start.

Useful facts:

(Source: http://deafandblind.com/word_frequency.htm)

Top Twenty Most Used Words in Written English: the of to in and a for was is that on at he with by be it an as his

Two Letter Word Frequency:of to in it is be as at so we he by or on do if me my up an go no us am

Three Letter Word Frequency: the and for are but not you all any can had her was one our out day get has him his how man new now old see two way who boy did its let put say she too use

Four Letter Word Frequency: that with have this will your from they know want been good much some time very when come here just like long make many more only over such take than them well were

Word Frequency for the Most Common Words: the of and to in a is that be it by are for was as he with on his at which but from has this will one have not were or all their an i there been many more so when had may today who would time we about after dollars if my other some them being its no only over very you into most than they day even made out first great must these can days every found general her here last new now people public said since still such through under up war well where while years before between country debts good him interest large like make our take upon what

Supplied methods:

This program requires some things we haven't talked about, so I'm supplying a bit of code.

Due date:

Thursday, October 25, before midnight. Turn in your program electronically, using Blackboard. (See my Instructions for Using Zip Files and Blackboard). Only assignments submitted via Blackboard will be accepted--do not send your program by email. The usual late penalty of 5 points per day (out of 100 points) will apply.