CIT 594 Assignment 1: Exploring the Collatz Function
Spring 2015, David Matuszek

# Purposes of this assignment

• Introduce timing and practical analysis
• Introduce memoization
• Possibly spark your interest in these sequences

# General idea of the assignment

• Compute the lengths of Collatz sequences in two ways: with and without memoization
• Estimate the Big-O running time of computing Collatz sequence lengths, with and without memoization
• Determine various features of the Collatz function

# In more detail

## The Collatz “Function”

The Collatz “function” is defined (for positive integers only) as:

`collatz(n) =`

• `1, if n = 1`
• `collatz(n/2), if n is even`
• `collatz(3*n+1), if n is odd`

The reason for the quote marks around the word “function” is this: No one knows if this is a function. To be a function, it has to be defined for all positive integers, otherwise it is only a partial function. Clearly, collatz(n)=1 for all values of n for which collatz is defined; but it is possible that, for some n, collatz(n) results in a cycle of values, or goes off to infinity. Programmers have tested collatz(n) with very large values of n, always resulting in 1, but no one has yet been able to prove that collatz(n)=1 for all positive integers. (See the Wikipedia article if you want more information.)

We define the length of a Collatz sequence as the number of numbers in the sequence, including the final `1`. For example:

• `collatz(1)` just gives the single element sequence `1`, so length is `1`
• `collatz(2)` gives the sequence `2 1`, so length is `2`
• `collatz(3)` gives the sequence` 3 10 5 16 8 4 2 1`, so length is `8`
• `collatz(4) `gives the sequence `4 2 1`, so length is `3`

# Part I: Timings

## What to program

You are expected to use Eclipse for all your programming. However, IntelliJ or NetBeans are also acceptable, provided your submitted project contains all necessary classes.

Create a project `Collatz` containing a package `collatz`, and in that package create a class `Collatz`. Then implement these methods:

`static int[] simpleComputeSequenceLengths(long n)`
Returns an integer array of size `n+1`, such that for all `1 <= i <= n`, array element `i` contains the length of the Collatz sequence starting with `i`. For example, array location `3` contains `8`, because the length of the Collatz sequence starting from `3` is `8`. Compute the length of each sequence separately, by repeatedly applying the definition, and counting the number of steps required to reach `1`.
Note 1: Array location `0` is not used.
Note 2: Use `int`s for sequence lengths, but use `long`s for the values in the sequence.
Note 3: When you first create an integer array, all locations have been preset to zero.
`static int[] memoizedComputeSequenceLengths(long n)`
This method computes the same result as `simpleComputeSequenceLengths`, but uses memoization. (Memoization is simply recording the results of previous computations and using them in future computations.) For each number in the Collatz sequence, if it is less than the starting number, you can compute the total sequence length without finishing the sequence. The notes for the previous method also apply here.
Example: Starting from `9`, we get `9 28 14 7` .... Since `7 < 9`, we have already computed the length of the sequence starting at `7` (it's `17`), so `4 + 17 - 1 = 20` (the` -1` is so we don't count `7` twice).
`static void doTimings(long n)`
Measures, for both of the above methods, how long it takes to compute Collatz sequence lengths for all the numbers from `1` to `n`, and prints the results.

Test (using JUnit 4) the two "lengths" methods, preferably using TDD. Be sure they are correct before attempting to time them.

## How to do timings

Here's the basic code to do timings:

```long startTime = System.nanoTime();
// ... the code being measured ...
long elapsedTime = System.nanoTime() - startTime;```

Once you have the above working (and tested!), do some sanity checking of your results. Sanity checking is a general term for seeing whether the results make sense. In this case, calling either of the above methods with a larger `n` should result in the method taking longer. Also, if memoization makes sense in this case, the memoized version should take less time than the non-memoized version.

You are very likely to find some problems. It might be, for example, that it takes less time to compute sequences up to 400000 than up to 300000. This could mean there is an error in your program (for example, not starting each timing run in a clean state, which might not show up in your JUnit tests), but it could also mean that random other things are messing up your timing.

Here are some tips for helping to get reasonable accuracy.

• Call `System.gc()` before you begin each timing.
• Reason: So garbage collection doesn't happen during your timing runs. This is probably the most important consideration for this assignment.
• Run the code multiple times, and take the average.
• For best accuracy, throw out the best and worst times, and take the average of the remaining times.
• Close all other applications, so you don't have the computer doing too many other things during the timings.
• Reason: `System.nanoTime()` gives "wall clock" time, not just time spent in your program.
• Be sure all timings start from the same state, and don't use data left over from a previous run.
• As much as possible, try to include in your timing only the code you want to time, not any initializations or cleanup.
• "Warm up" your program by executing all your methods at least twice before you start timing.
• Reason: The compiler may delay optimizing your code until it is needed, so the first few executions may be much slower.

# Part II: Explorations

Define the following convenience class:

```class Pair<A, B> {
private A first;
private B second;

public Pair(A first, B second) {
this.first = first;
this.second = second;
}
public A _1() { return first; }
public B _2() { return second;}
@Override public String toString() { return "(" + first + ", " + second + ")"; }
}```

Using TDD, write and unit test the following methods:

`static long collatz_1(long n)`
This nonrecursive method makes just one step of the collatz sequence. For example, given 7, the method returns 22; given 10, it returns 5; given 1, it returns 1. Use `long` values rather than `int` values.
`static List<Long> sequenceOf(long n)`
Returns the Collatz sequence starting with `n` and ending with `1` (`1` should be the last value in the list). This method should not be recursive, as Collatz sequences can become quite long and exhaust stack space.
`static int lengthOfSequence(long n)`
Returns the length of the Collatz sequence starting with `n`.
`static long largestValueInSequence(long n)`
Returns the largest value in the Collatz sequence starting with `n`.
`static List<Pair<Long, Integer>> equalLengthTwins(long lo, long hi)`
Sometimes two adjacent sequences (starting from n and from n+1) have some characteristic in common. For example, `sequence_of(28)` and `sequence_of(29)` both have length 19. Find all equal-length twin pairs in the range `lo` to `hi`, returning a list of pairs of (n, length) (in this example, the pair (28, 19), where `lo <= n < n+1 <= hi`. In this example, `sequence_of(30)` also has length 19, so (29, 19) would also be in the list of results.
`static List<Pair<Long, Long>> equalMaxValueTwins(long lo, long hi)`
Sometimes two adjacent sequences (starting from n and n+1) both reach the same maximum value, before collapsing to 1. For example, the sequences for 5 and 6 both reach a maximum value of 16. Find all equal-max twin pairs in the range `lo` to `hi`, returning a list of (n, max value) pairs, where `lo <= n < n+1 <= hi`. For example, the pair (5, 16) would be in the list of results.
`static int[] occurrences(long n, int counts)`
Returns an integer array of size `counts+1`. Array location `i` will contain the number of times the number `i` occurs in the Collatz sequences from `1` to `n`, inclusive. Numbers larger than `counts` will not be counted.
For example, if `n=100`,
• Array location `1` will contain `100`, because `1` occurs in all 100 Collatz sequences.
• Array location `0` will contain `0`, which occurs in no Collatz sequence.
• Array location `2` will contain `99`, as it occurs in every Collatz sequence except the one beginning with `1`.
One or two methods of your own devising, to test any aspects of Collatz sequences that you think might be interesting.

## The `main` method

Your `main` method should:

• Call `doTimings` for `100000`, `200000`, `300000`, ..., `1000000` (hundred thousands, to one million). (Results are printed by `doTimings`.)
• For values of `n` from 1 to 100, print out `n`, the length of the sequences starting from `n`, the maximum value in that sequence, and the sequence itself. The output should look something like this:
```  1: length   1, max    1: 1
2: length   2, max    2: 2 1
3: length   8, max   16: 3 10 5 16 8 4 2 1```
• Print the list of equal length twins for starting values in the range 1 to 100.
• Print the list of equal value twins for starting values in the range 1 to 100.
• Print the list of occurrences of each value in the range 1 to 100, for starting values in the range 1 to 1000000.
• Print the results of the methods you came up with to explore other aspects of the Collatz sequence.

## What to write up

Once you have a "clean" set of timing numbers, include them in your writeup. Try to come up with a formula for each set of results (memoized and not memoized), such that f(n) = time. Then state what the Big-O time of each appears to be.

• It helps a lot of you make graphs of your results. You can include these in your writeup, if you like.
• If you have tools to help you figure out suitable formulae, feel free to use them, and tell us what you did.
• Don't obsess about this. If you can't figure anything out, at least make a reasonable guess.

There are interesting (and sometimes surprising) things to discover about the Collatz sequences. Mention anything that seems unusual or unexpected to you.

You are supposed to write one or two methods of your own to check out things about Collatz sequences. Briefly describe what those methods are, and what the results are.

Before you begin, estimate how long this project will take you. When you are done, tell about how long it actually took.

Keep your writeup short--about one page, maybe two or three if you include graphs.

Just like last semester, we will do unit testing of your methods, so be sure you get the signatures right. Also like last semester, you are expected to JUnit test your own methods (all but `main` and `doTimings`, which do I/O). If you do a good job with your unit tests, you won't lose points by failing ours.
Zip and turn in your entire project, which should include a `readme.txt` file (or `readme.doc`, `readme.pdf`, `readme.rtf`, or `readme.html`). The readme file will count for 20% of your grade on this project.