CIT 591 Assignment 4: Readability
Fall 2009, David Matuszek
There are various measures of "readability"--how good a reader need to be in order to understand a passage of English text. These measures are based on the average length of words (usually measured in syllables) and the average length of sentences (measured in words). The result is usually given as the number of years a child has to have attended school (her grade level) in order to understand the text. These measures are crude, but better than nothing. Your assignment is to let the user read in a passage of text (from a file), apply the formulae, and print out the results.
Also, just to give you some experience with Python dictionaries, I've added a measure of my own: How "rich" the vocabulary is, that is, how many different words are used.
Here are the formulae you should apply. These have been taken from the corresponding Wikipedia articles and from http://www.readability.info/info.shtml; where these disagree, I've used the formula from Wikipedia. In addition, I've made some simplifications of my own (such as, use the entire text, not selected sentences). I'm using a simple syllable-counting algorithm from Stack Overflow. Since we will be using our own unit tests to test your program, follow this assignment exactly as written. Provide exactly the functions listed below, with exactly the same names and parameter lists, and save it in a file named exactly readability.py.
| Name | Formula | Notes |
|---|---|---|
| Kincaid | ![]() |
Also known as "Flesch-Kincaid." |
| Automated Readability Index | ARI = 4.71*chars/wds+0.5*wds/sentences-21.43 | "Chars" is a count of letters, not all characters. Variables are total letters, words, and sentences. |
| Coleman-Liau | ![]() |
"Characters" is a count of letters, not all characters. Variables are total letters, words, and sentences. |
| Flesch | ![]() |
Not comparable to others--high scores (up to about 100) are easier. |
| Fog (Gunning) | ![]() |
"Complex words" are words with three or more syllables. Variables are total words, complex words, and sentences. |
| Lix | Lix = wds/sent+100*(wds >= 6 char)/wds | Variables are total words, sentences, and long words. |
| SMOG | ![]() |
"Polysyllables" are words with three or more syllables; same as "complex words." SMOG is an acronym for "Simple Measure Of Gobbledygook." |
Each vowel (a, e, i, o, u, y) in a word counts as one syllable subject to the following sub-rules:
All the methods listed are required. Be careful to get the spelling and capitalization exactly as shown. You may have additional methods, and all methods (except main() and those devoted to doing I/O) must be thoroughly unit tested.
In the following, text means a list of lines (strings); string means a single string, which might be an individual word or a complete sentence (with or without punctuation).
def initialize() def readFile(fileName) def extractNextSentence(text) readFile) and, as side effects, (1) the sentence is removed from the text, and (2) a global count of sentences is updated..), exclamation point (!), or question mark (?). As in a book, there may be many sentences on a line, or many lines in a sentence.def getWordCount(string)don't" is a single word.) Numbers don't count as words.def getComplexWordCount(string)def getLetterCount(string)def getSyllableCount(string)The following functions should be trivial (probably, each just returns the value of some global variable). Since the work should already have been done, none of them needs or uses a parameter.
def getTotalSentenceCount(string)def getTotalWordCount(string) def getTotalComplexWordCount(string)def getTotalLetterCount(string)For Kincaid and Flesch, you need to keep a count of the total number of syllables. For the Lix measure, you also need to keep a count of the number of words having six or more letters. I did not notice this when I wrote the assignment, so how you do this is up to you. I suggest using the required methods (such as keeping track of the total word count) as a model.
def getKincaidMeasure()
def getARImeasure()
def getColemanLiauMeasure()
def getFleschMeasure()
def getGunningFogMeasure()
def getLixMeasure()
def getSmogMeasure()
def getRichness()c be 1/2 the total word count. Add the word frequencies (93 times for "the", 32 for "and", ....) until you reach or exceed c. Return, as an integer result, the number of words you had to add to reach or exceed c.def main()A significant part of the grade will be based on how good your unit tests are, and whether it looks like you used TDD. In addition, you are expected to get the same answers as I do (so use the same formulae!).