CIT 594 Counting Words with ForkJoin
Spring 2014, David Matuszek

Purposes of this assignment

General idea of the assignment

Read in a book-length text file. Count how many times each word occurs in the file. Print a table of the 25 most frequent words and a table of the 25 least frequent words. Both tables should be ordered most frequent to least frequent.

Detailed specification

Name your project WordCounter, your package wordCounter, and your main class WordCounter.

When your program is executed, it should use a JFileChooser to select a file. The actual counting should be done by a public void countWords(String text, int numberOfSegments) method, which is I/O free and saves its result in a global variable of some type. The program should then print out the time required (in milliseconds) by the above call, along with the 25 most frequent and 25 least frequent words. (If ties occur, words with equal counts may occur in any order.)

Also provide these methods:

For purposes of this assignment, a "word" consists of letters, and may contain an apostrophe (') or hyphen (-), but only if the apostrophe or hyphen has a letter on each side of it. Hence, "don't" is a word, but "'tis" is the word "tis". Words do not contain digits or other punctuation. Sherlock Holmes's address, "221B" on Baker Street, will be treated as the word "B". All words should be lowercased for purposes of counting; proper names ("France", "John", etc.) will be treated as words. (If these rules aren't enough to distinguish words from other things, ask me, or do something reasonable. Details aren't that critical.)

Timing

For some suitably long text, perform the following timings:

Include a readme.txt file with the above information. Also, for the computer on which you did the testing, tell how many cores it has.

Testing

How do you know you have the correct answer? As usual, try your program with a small sample of text, where you have manually counted the words.

Due date

Your program is due before 6am Tuesday, April 22. Submit via Canvas, as usual.