CIT 590/591 Assignment 3: License Plate Lookup
Fall 2012, David Matuszek

# Purposes of this assignment

• To give you more practice with string manipulation.
• To give you some practice with lists and random numbers.

# Backstory

The city of Pseudopolis has had a rash of hit-and-run accidents. Most recently, an 8-year old girl was hit, and is currently in intensive care; the driver fled the scene in a dark-colored car of unknown make. There was one witness, who got a glimpse of the license plate of the car. The police would very much like your assistance in helping to catch the driver, in this and in similar cases.

The problem the police have is that witnesses are not very good at reading or remembering license plate numbers. Sometimes they misread letters as digits, or vice versa. Sometimes they remember too many letters or digits, and sometimes remember too few. Sometimes they just get a letter or digit wrong.

The police cannot check every car in the city, but if they have a relatively small number of possible cars, they can check those. (Checking a car involves sending an officer out to find the car and look at it for possible collision damage.) Your job is to find those license plate numbers that most closely resemble the number reported by a witness.

# General idea of the assignment

A spell checker reads your document and compares the words in it to a dictionary. If it doesn't find a word in the dictionary, it suggests a replacement word (typically, more than one).

In this assignment you will write a program that uses, not a dictionary of words, but a list of license plate numbers. Your checker will return the license plates that best match a given (possibly incorrect) license plate.

You will use a random number generator to create the list of "actual" license plate numbers. Numbers to be searched for will be entered by the user.

# Details

A license plate in Pseudopolis consists of two capital letters followed by three digits, for example, AB123. All entries in the list of known Pseudopolis license plate numbers will be of this form. Since we don't have "real" data, you will use a random number generator to create this "database" of license plate numbers.

Your program needs to find license plate numbers that are "most similar" to a given number. There is no single, obvious measure of similarity. Here are the rules that the chief of police wants you to follow in deciding how different two numbers are from each other:

• Simple corrections--removing spaces, changing letters to digits or digits to letters--are "free." (Example: A8123AB123.).
• If both numbers are the same length:
• Apply the simple corrections mentioned above, then
• Compare the license plates character-by-character, and count how many characters are different. (For example, AB123 and AC153 differ in two places.)
• If the license plate input by the user is six characters long:
• For each character in the number that was input, remove it, then apply the simple corrections and count differences. (For example, given ABC123, try BC123, AC123, AB123, ABC23, ABC13, and ABC12.)  Longer number Remove character Shortened number Compare to Differences ABC123 ABC123 ABC123 ABC123 ABC123 ABC123 A B C 1 2 3 BC123 AC123 AB123 ABC23 ABC13 ABC12 AB123 AB123 AB123 AB123 AB123 AB123 2 1 0 1 2 3
• Keep the license plate with the smallest difference. In this example, that would be zero.
• Add 1 to the smallest difference (for the one character that was removed), and call this the result.
• If the license plate input by the user is four characters long:
• Do not apply the simple corrections (there are too many cases to consider),
• For each character in the number taken from the database,
• Remove that one character,
• Count differences, keeping the license plate with the fewest differences, and
• Add 1 to the smallest difference (for the missing character), and call this the result.

You should thoroughly test all functions with unittest. Please follow the sequence recommended in class. That is, write the test functions first, along with stubs; then run the tests; then start filling in the stubs with the actual code, testing frequently. I can't watch over your shoulder to make sure you do things this way, but please give it a try.

## Functions

Write (at least) the following functions:

Creates and returns a list of size random license plate numbers (see below). Make sure that the list does not contain duplicate license plate numbers.
Creates a "database" of license plates numbers. Actually, all it really does is make a copy of the list that is given to it, and returns this copy. This is so that you can set up a "fake" database, containing known license plate numbers, for testing purposes.
Makes simple corrections to the license plate number that is given to it, and returns the corrected license plate number. Here are the corrections to be made:
• Spaces at the beginning or end of the license plate number should be removed.
• All lowercase letters should be converted to uppercase.
• If digits occur where there should be letters, they should be converted to the most similar letters, and vice versa. Call the next two functions to accomplish this
If digits occurs where there should be letters in the license plate number, they are converted to the most similar letters: 1I, 2Z, 3B, 4A, 5S, 6G, 7T, 8B, 9P, 0O.
If letters occurs where there should be digits in the license plate number, they are converted to the most similar digits: A4, B8, C0, D0, G6, I1, O0, Q0, S5, T7, Z2. Letters not specified in this list are left unchanged.
Returns True if and only if the database (a list of license plate numbers) contains exactly the license plate number given. (Note: This is a one-line function.) If the license parameter needed some simple corrections, they have already been applied, and an exact match is still possible.
def count_nonmatching_characters(string_1, string_2)
Returns the number of characters that are different in the two strings. The two strings should already have been adjusted to be the same length (use assert to make sure this is the case), so you need only compare characters in the same positions.
Returns the number of differences (as defined by the chief of police) in the two license plate numbers, starting with adjusting for length and applying simple corrections, as detailed above. This is the function that computes the complete number of differences, start to finish. The Simple corrections have already been applied to the license_from_user, and don't add to the count. The license_from_user may be as short as four characters or as long as six characters. The license_from_database is, of course, the correct length.
Searches the dictionary and returns a list of lists. The first sublist contains those license plate numbers that exactly match the given license; this sublist should be either empty or contain exactly one element, license. The second sublist contains those license plate numbers that differ by 1 from license; the third sublist, those that differ by two; and so on, up to and including those that differ by max_differences. This function should, of course, use the count_differences function.
Get a response from the user, using the question as a prompt. If the response begins with a 'y' or 'Y' (after removing spaces), return True; if it begins with an 'n' or 'N', return False; otherwise print Please answer with 'yes' or 'no.' and recursively call the function. (This is a generally useful function that you can use in other programs.)
def main()
Uses generate_license_plate_numbers to create a random dictionary of 10000 license plate numbers, then repeatedly gets a license plate number from the user and searches the database for the best match or matches. Given a possibly incorrect license plate number, as reported by a witness, your program should:
• Reject any license plate that is shorter than four characters or longer than six characters.
• Search the database and print out:
• Whether the license plate number exactly matches some number in the database.
• All license plate numbers that differ by one (as defined above) from the given number.
• All license plate numbers that differ by two (as defined above) from the given number.
After each result is printed, ask the user if he/she wants to enter another license plate number; if not, print "Done" and quit.
if __name__ == '__main__':
main()
This says:
1. If this file is loaded directly into the REPL, the function main() will be called automatically.
2. If this file is used by another file (such as the unit test file), the function main() will not be called automatically.

Write unit tests for all the above functions, except ask_yes_or_no and main. Because these two functions do input/output, they must be tested manually.

If you find it convenient to write additional functions, start by writing unit tests for them.