[an error occurred while processing this directive]

Checklist: Dynamic Programming

Frequently Asked Questions: Sequence Alignment

What are the main goals of this assignment? You will (i) solve a fundamental problem in computational biology, (ii) learn about an important application in computer graphics, and (iii) learn about a powerful programming paradigm known as dynamic programming.

How do I tell Java to use more of my computer's memory? By default, Java will only use 64MB of memory (not very much for this application). You must explicitly ask for more by executing with

java -Xmx300m EditDistance < input.txt
The 300m means 300MB, and you should adjust this number depending on the amount of memory your computer has and the size of the arrays you will need for the data set you are running. (Everyone should be able to request 600m which should get you through ecoli10000.txt)

How do I determine how much memory I have? On Mac, select About this Mac from the Apple menubar. You can also observe the CPU and memory usage using the Activity Monitor. On Windows, type Ctrl-Shift-Escape, and look at the last tab so see CPU and memory usage.

Can I assume that the input characters will always be A, C, G or T? NO! Your program should work equally well for any letter, upper case or lower case.

It seems strange to define x[i..M] to be the substring consisting of x[i] through x[M-1] instead of x[i] through x[M]. Is this a typo? It's a little strange, but no, it's not a typo. It's consistent with Java's indexing notation where the left endpoint is inclusive and the right endpoint is exclusive.

Which alignment should I print out if there are two or more optimal ones? Output any one you like.

Testing: Sequence Alignment

Testing.   To help you check the part of your program that finds the edit distance, the edit distances of gene57.txt, stx1230.txt, and ftsa1272.txt are 8, 521, and 758, respectively.

To help you check the part of your program that generates the alignment, there are several short test files in the sequence directory whose alignments are easy to generate manually.

In addition, we require that you generate one input file of your own to be used for testing special cases. Create a new input file with some interesting property. Then test your code using your file and make sure your program behaves as expected.

Possible Progress Steps: Sequence Alignment

These are purely suggestions for how you might make progress. You do not have to follow these steps.

  1. Download DataPair.java and Path.java Download and unzip sequence.zip to your system. It contains sample data files.

  2. Create a skeleton Match.java file including the simple test routing below:
    public class Match { public static Path match(DataPair dp) { return null; // placeholder so Match will compile } public static void main(String[] args) { TestMatch tm = new TestMatch(); Path path = Match.match(tm); for (Path p = path; p != null; p = p.next) System.out.println(p.row + " " + p.col); } } class TestMatch implements DataPair { public int rows() { return 10; } public int cols() { return 10; } public int cost(int row, int col, int dir) { return baseCost(row, col); } public int baseCost(int row, int col) { return Math.abs(row - col); } }

  3. Declare and initialize a dp.rows() x dp.cols() array Path opt[][] in Match.match(). Set the row and col fields of each element to correspond to the node's row and column in the array. Print out the 2D array to check your work.

    To print the matrix out in nicely formatted columns, use

    System.out.printf("%2d %2d %3d", opt[row][col].row, opt[row][col].col, opt[row][col].cost);
    with nested for loops. Remember to remove this debugging print section before submitting the final version of your program.

  4. Now, it's time to implement the crucial dynamic programming part. First read the dynamic programming portion of Section 9.6 and make sure you understand how dynamic programming works. Think carefully about the order in which you will perform the computation since this is critical. Hint: first fill in the base case of opt[row][col], i.e., when row == dp.rows() - 1 or col == dp.cols() - 1. Now, fill in the remaining values using a nested loop. Test that it works by printing out the contents of opt. Make sure you update the cost in each node to the total cost so far and also update the next pointer to keep track of which direction you came from.

  5. Test Match.match() using the TestMatch class and main method from the template above. TestMatch is a very simple implementation of the DataPair interface that yields a 10x10 matrix where the cost of matching a row i and column j is the absolute value of the difference between i and j. In particular, the cost of matching along the diagonal from upper-left to lower-right is 0, and the cost of matching anywhere else in the matrix is larger than zero. So Match.match() should return a path that follows this diagonal, and your output should be:
    % java Match
    0 0
    1 1
    2 2
    3 3
    4 4
    5 5
    6 6
    7 7
    8 8
    9 9

    This test is a sanity check that Match.match() is doing something reasonable, but is too simple to guarantee everything works. You may want to modify TestMatch or create additional test classes to create more complex test cases as well. Since these classes and Match.main() exist only for testing purposes, you should leave them and any debug output in them when you submit Match.java. However you should remove any print statements from Match.match() that you add for debugging purposes.

  6. Now it is time to write EditDistance.java. Create private String instance variables for the two strings you are matching, as well as any other data you need to keep track of. Since the base case for edit distance involves matching the final characters of one string with "gaps" in the other, you need one additional row and column beyond the length of your two strings. baseCost() and cost need to take this extra row and column into account. There is more than one way of handling this base case, and you may use any reasonable method that works.

  7. Make sure when you print out the match in EditDistance.main() that you handle the base case of gaps at the end of one of the strings correctly. Finally, make sure you do not print the last node in the path, which will always correspond to matching gaps at the end of both strings (the lower-right corner of the opt matrix).

Enrichment: Sequence Alignment

Frequently Asked Questions: Image Stitching

Do I need to include a main method in ImageSeam? You do not have to include a main method, but if you write one for testing purposes, you should not remove it. You should remove any debugging-related code from the other methods however.

How robust should ImageSeam be to invalid input? The ImageSeam constructor and getStitched methods should handle invalid pictures as described in the assignment. However the other methods may assume they will only be called if the data is valid.

Possible Progress Steps: Image Stitching

These are purely suggestions for how you might make progress. You do not have to follow these steps.

  1. Complete Part 1 of the assignment. Image stitching is conceptually easier than sequence alignment and involves a simpler cost function, but it's harder to debug because the intermediate and final results are much harder to evaluate by hand.

  2. Implement rows(), cols(), baseCost(), cost(), and the constructor.

  3. Write a main method that creates two 10x10 pictures. Set all pixels of the first picture to Color.BLACK. Set all pixels of the second picture to Color.WHITE except the principal diagonal (the diagonal running from the upper-left to lower-right corners). Set the pixels along that diagonal to Color.BLACK. Create an ImageSeam object using these two images, and print out the path computed in the constructor. The path should follow the principal diagonal where the two pictures are both black.

  4. Implement getStitched(). This is the trickiest method in ImageSeam. Remember that the first node in the path will always be at row 0 and column 0. The next node in the current path will always be one to the right, one below, or one to the right and below the current one.

  5. Update main() to call getStitched() and display the result. You should get an image with one white triangle and one black triangle.

  6. Test the image examples from the assignment using Stitcher. Make sure you have not mixed up which image getStitched() uses for the portion to the left of the seam, and which portion it uses to the right of the seam.

Enrichment: Image Stitching