Scala Assignment 1: Cryptograms
Fall 2011, David Matuszek

Purposes of this assignment:

General idea of the assignment

A cryptogram is a simple "secret code" is one in which one letter is consistently substituted for another letter. For example,
     Susie sells sea shells by the sea shore.
could be coded as
     Vxvlh vhoov vhd vkhoov eb wkh vhd vkruh.
No letter is ever substituted for itself in a cryptogram. For example, the above message could not be encoded as
    Sxslh shoos shd skhoos eb wkh shd skruh.

One part of this assignment will be to create a random secret code, and use it to encode text. (This is very easy, except for the part that says the code can't map a letter to itself; that part is much harder.)

Another part will be to use various tricks to attempt to decode a secret message when the code is not known.

For most cryptograms, you will probably not be able to completely decode the message. Instead, it may help to have some measure of "goodness" of a decoding. For example, you might simply count how many recognizable words your translation contains (you don't need an entire dictionary; I've provided a list of the most common words below). Or you might try to come up with a better metric.

Supplied code

Most I/O is still based on Java, and is therefore rather ugly. Since I don't want you to waste a lot of time trying to read in a file, here's some sloppy but short code you can use.

  /**
   * Asks the user to choose an input file, and returns a list of lines
   * in the file.
   * @return The lines in the file.
   */
  def readMessage = {
    val chooser = new FileChooser
    val result = chooser.showOpenDialog(null)
    try {
      if (result == FileChooser.Result.Approve)
        Source.fromFile(chooser.selectedFile).getLines.toList
      else List("")
    } finally {
      Source.fromFile(chooser.selectedFile).close
    }
  }

Suggestions

Since this is your first serious Scala program, don't try to do too much at once. Pick some small functions you think you might need, such as one to split a line of text into “words” (leave in punctuation; you can take that out in another function), and do those first.

Use letter frequencies to try to determine the code; use the list of most common words to evaluate your solutions.

Short messages are more difficult. Your program will have better luck with long cryptograms. A few letters (J, Q, Z) are essentially impossible to get right without a much larger set of words than the ones provide below; don't worry about that.

Here are some observations that may help you:

There are about 25! possible encodings (I'm not sure of the exact number), so you cannot possibly try them all.

Useful facts

(Source: http://deafandblind.com/word_frequency.htm)

Letter Frequency in the English Language: e t a o i n s r h l d c u m f p g w y b v k x j q z

More than half of all words end with: e t d s

Top Twenty Most Used Words in Written English: the of to in and a for was is that on at he with by be it an as his

Two Letter Word Frequency: of to in it is be as at so we he by or on do if me my up an go no us am

Three Letter Word Frequency: the and for are but not you all any can had her was one our out day get has him his how man new now old see two way who boy did its let put say she too use

Four Letter Word Frequency: that with have this will your from they know want been good much some time very when come here just like long make many more only over such take than them well were

Word Frequency for the Most Common Words: the of and to in a is that be it by are for was as he with on his at which but from has this will one have not were or all their an i there been many more so when had may today who would time we about after dollars if my other some them being its no only over very you into most than they day even made out first great must these can days every found general her here last new now people public said since still such through under up war well where while years before between country debts good him interest large like make our take upon what

Grading

Your program is not expected to produce a perfect decoding. However, for some inputs (say, above a couple hundred words), it should produce an output that is more-or-less readable by a human. The best programs may get extra credit points.

Every method should have Scaladoc comments. Be sure to use the @author tag in the comment for the main object.

Special rules

The reasons for these rules are twofold: (1) They are intended to get you using some of the unique Scala constructs, and (2) It is easy to write a program to count these keywords.

Due date

Monday, November 14, 6am.