CIT 591 Numeric spell checker notes Fall 2004, David Matuszek

Here are the answers to some questions that Gemma, Mirko, and I have received on the "Numeric Spell Checker" assignment.

# Corrected declaration of random number generator

The assignment said
Random random = new Random();
but it has to be
static Random random = new Random();
because you need to call this method from the static method generateRandomArray.

# Range of numbers

To get pseudo-random numbers for this assignment, use the method call random.nextInt(). That is, call nextInt with no parameters. This will give you the full range of integers, from Integer.MIN_VALUE (-2147483648) to Integer.MAX_VALUE (2147483647). In general, about half of these numbers will be negative.

Don't be surprised if most of your random numbers are extremely large (or extremely small). Remember that there are ten times as many integers with N+1 digits as there are with N digits. Thus, for example, an 8-digit number is a million times more likely to occur than a 2-digit number. Most of your numbers will have nine or ten digits.

# Testing generateRandomArray(int size)

How do you test whether the result of calling generateRandomArray(size) really gives you random (actually, pseudo-random) numbers?

The short answer is, there really isn't any good way. Here's what I would test:

Is the returned array of the correct size?
This is an obvious and easy thing to check. (By the way, there is no point in checking whether you get back an array of the correct type; the compiler will do this.)

Are both the first and last locations nonzero?
If your method didn't put anything in the array, it will be full of zeros, so you can check that the method put something into the array. While there is about a chance in two billion that any given location contains a zero,I don't think we need to worry about that for testing purposes.

If size > 1, are the first and last locations different from each other? Or maybe the first and second?
It is possible to have an error that fills all the array locations with the same randomly chosen number, so let's test for that. Incidentally, pseudo-random number generators are designed to cycle through all possible numbers before producing a duplicate, so if we get a duplicate, there is a bug somewhere. (You could test if there were any duplicates, anywhere in the array, but that's too much work for a highly unlikely situation.)

Someone suggested using a fixed seed for the random number generator while testing (so that you can know what numbers are in the array), then changing back to one that used the time of day for the "final" program. Don't do this! It's a bad idea for two reasons: (1) It involves making an untested change in a previously tested method, and (2) it breaks the test method, which will now report failure. One of the key points in having a suite of tests is that you can run them anytime, to make sure everything still works. Always keep a set of working tests!

# Testing createDictionary(int[] numbers)

What this method should do is create another "dictionary" array of the same size as numbers, and run through a loop to copy values from numbers into this new array.

If, as is good practice, you make your instance variables private, then there is no easy access to the "dictionary" array, so testing becomes difficult. Here are some possible solutions, none of them perfect.

Make the "dictionary" array public. Then you can see if it has the right values in it.
The problem with this is that there is no reason, other than testing, to make this variable public. That just exposes it to modification by classes in other methods, which can lead to very difficult-to-find bugs.

Provide a getter method for the dictionary.
There are two problems with this. First, there is not reason, other than testing, for having this method. Second, it provides "false security," since the simple thing to do is to return a reference to the array, rather than a copy of the array. And making a copy seems like overkill.

Test the created "dictionary" array with the boolean exactMatch(int) method.
To do this, create a small dictionary of perhaps four numbers. Then test whether each of thes numbers is in the array (and also test that a couple of other numbers are not in the array.) This has the disadvantage that the test of one method is not completely independent of any other methods; but this is a minor disadvantage compared to the others. This is the testing approach that I recommend.

How can you tell that the "dictionary" array is a copy of numbers, rather than a reference to numbers? Well, if you had easy access to the "dictionary" array, you could use == or !=. But you don't have easy access. I'll leave this as a (fairly easy) puzzle for you to solve.

Testing, like programming, is as much an art as a science. As you can see, there are tradeoffs, and you just have to pick the option that seems best.

# Testing createRandomDictionary(int size)

This method is so simple, and so difficult to test, that maybe it isn't worth the effort. Still, I said to test it, so you should do something. You can at least check that you got back an array of the correct size.

# Testing other methods

Other methods are a whole lot easier to test if you know exactly what is in the "dictionary" array. That's why I separated the action of creating a random array from the action of creating a dictionary. Where possible, use a known array for testing your methods.

Here are some conclusions you might draw from all this:

Testing can be a lot easier if you design the tests first.

You can't test everything. (But with a little thought, you can test more than you might expect.)

It's best to have alternatives from which you can choose. Hence, you shouldn't always do the first thing you think of.

Writing the tests first may make the program better in some ways, but it also tempts you to make things public that really shouldn't be public.