CIT 594 Assignment 1: Exploring the Collatz Function
Spring 2015, David Matuszek

Purposes of this assignment

General idea of the assignment

In more detail

The Collatz “Function”

The Collatz “function” is defined (for positive integers only) as:

collatz(n) =

The reason for the quote marks around the word “function” is this: No one knows if this is a function. To be a function, it has to be defined for all positive integers, otherwise it is only a partial function. Clearly, collatz(n)=1 for all values of n for which collatz is defined; but it is possible that, for some n, collatz(n) results in a cycle of values, or goes off to infinity. Programmers have tested collatz(n) with very large values of n, always resulting in 1, but no one has yet been able to prove that collatz(n)=1 for all positive integers. (See the Wikipedia article if you want more information.)

We define the length of a Collatz sequence as the number of numbers in the sequence, including the final 1. For example:

Part I: Timings

What to program

You are expected to use Eclipse for all your programming. However, IntelliJ or NetBeans are also acceptable, provided your submitted project contains all necessary classes.

Create a project Collatz containing a package collatz, and in that package create a class Collatz. Then implement these methods:

static int[] simpleComputeSequenceLengths(long n)
Returns an integer array of size n+1, such that for all 1 <= i <= n, array element i contains the length of the Collatz sequence starting with i. For example, array location 3 contains 8, because the length of the Collatz sequence starting from 3 is 8. Compute the length of each sequence separately, by repeatedly applying the definition, and counting the number of steps required to reach 1.
Note 1: Array location 0 is not used.
Note 2: Use ints for sequence lengths, but use longs for the values in the sequence.
Note 3: When you first create an integer array, all locations have been preset to zero.
static int[] memoizedComputeSequenceLengths(long n)
This method computes the same result as simpleComputeSequenceLengths, but uses memoization. (Memoization is simply recording the results of previous computations and using them in future computations.) For each number in the Collatz sequence, if it is less than the starting number, you can compute the total sequence length without finishing the sequence. The notes for the previous method also apply here.
Example: Starting from 9, we get 9 28 14 7 .... Since 7 < 9, we have already computed the length of the sequence starting at 7 (it's 17), so 4 + 17 - 1 = 20 (the -1 is so we don't count 7 twice).
static void doTimings(long n)
Measures, for both of the above methods, how long it takes to compute Collatz sequence lengths for all the numbers from 1 to n, and prints the results.

Test (using JUnit 4) the two "lengths" methods, preferably using TDD. Be sure they are correct before attempting to time them.

How to do timings

Here's the basic code to do timings:

long startTime = System.nanoTime();
// ... the code being measured ... 
long elapsedTime = System.nanoTime() - startTime;

Once you have the above working (and tested!), do some sanity checking of your results. Sanity checking is a general term for seeing whether the results make sense. In this case, calling either of the above methods with a larger n should result in the method taking longer. Also, if memoization makes sense in this case, the memoized version should take less time than the non-memoized version.

You are very likely to find some problems. It might be, for example, that it takes less time to compute sequences up to 400000 than up to 300000. This could mean there is an error in your program (for example, not starting each timing run in a clean state, which might not show up in your JUnit tests), but it could also mean that random other things are messing up your timing.

Here are some tips for helping to get reasonable accuracy.

Part II: Explorations

Define the following convenience class:

class Pair<A, B> {
    private A first;
    private B second;
    
    public Pair(A first, B second) {
        this.first = first;
        this.second = second;
    }
    public A _1() { return first; }
    public B _2() { return second;}
    @Override public String toString() { return "(" + first + ", " + second + ")"; }
}

Using TDD, write and unit test the following methods:

static long collatz_1(long n)
This nonrecursive method makes just one step of the collatz sequence. For example, given 7, the method returns 22; given 10, it returns 5; given 1, it returns 1. Use long values rather than int values.
static List<Long> sequenceOf(long n)
Returns the Collatz sequence starting with n and ending with 1 (1 should be the last value in the list). This method should not be recursive, as Collatz sequences can become quite long and exhaust stack space.
static int lengthOfSequence(long n)
Returns the length of the Collatz sequence starting with n.
static long largestValueInSequence(long n)
Returns the largest value in the Collatz sequence starting with n.
static List<Pair<Long, Integer>> equalLengthTwins(long lo, long hi)
Sometimes two adjacent sequences (starting from n and from n+1) have some characteristic in common. For example, sequence_of(28) and sequence_of(29) both have length 19. Find all equal-length twin pairs in the range lo to hi, returning a list of pairs of (n, length) (in this example, the pair (28, 19), where lo <= n < n+1 <= hi. In this example, sequence_of(30) also has length 19, so (29, 19) would also be in the list of results.
static List<Pair<Long, Long>> equalMaxValueTwins(long lo, long hi)
Sometimes two adjacent sequences (starting from n and n+1) both reach the same maximum value, before collapsing to 1. For example, the sequences for 5 and 6 both reach a maximum value of 16. Find all equal-max twin pairs in the range lo to hi, returning a list of (n, max value) pairs, where lo <= n < n+1 <= hi. For example, the pair (5, 16) would be in the list of results.
static int[] occurrences(long n, int counts)
Returns an integer array of size counts+1. Array location i will contain the number of times the number i occurs in the Collatz sequences from 1 to n, inclusive. Numbers larger than counts will not be counted.
For example, if n=100,
One or two methods of your own devising, to test any aspects of Collatz sequences that you think might be interesting.

The main method

Your main method should:

What to write up

Once you have a "clean" set of timing numbers, include them in your writeup. Try to come up with a formula for each set of results (memoized and not memoized), such that f(n) = time. Then state what the Big-O time of each appears to be.

There are interesting (and sometimes surprising) things to discover about the Collatz sequences. Mention anything that seems unusual or unexpected to you.

You are supposed to write one or two methods of your own to check out things about Collatz sequences. Briefly describe what those methods are, and what the results are.

Before you begin, estimate how long this project will take you. When you are done, tell about how long it actually took.

Keep your writeup short--about one page, maybe two or three if you include graphs.

Grading

Just like last semester, we will do unit testing of your methods, so be sure you get the signatures right. Also like last semester, you are expected to JUnit test your own methods (all but main and doTimings, which do I/O). If you do a good job with your unit tests, you won't lose points by failing ours.

Zip and turn in your entire project, which should include a readme.txt file (or readme.doc, readme.pdf, readme.rtf, or readme.html). The readme file will count for 20% of your grade on this project.

Due date

Turn your assignment in to Canvas before 6 a.m. Thursday, January 22.

The general requirements are the same as for last semester, but the late policy has changed.