[an error occurred while processing this directive]

Checklist: Dynamic Programming

Frequently Asked Questions: Match Sequence Alignment

What are the main goals of this assignment? You will (i) solve a fundamental problem in computational biology, (ii) learn about an important application in computer graphics, and (iii) learn about a powerful programming paradigm known as dynamic programming.

How do I tell Java to use more of my computer's memory? By default, Java will only use 64MB of memory (not very much for this application). You must explicitly ask for more by executing with

java -Xmx1024m EditDistance < input.txt
The 1024m means 1024MB (1 GB), and you should adjust this number depending on the amount of memory your computer has and the size of the arrays you will need for the data set you are running. (Our implementation needs close to 3GB of RAM for ecoli7000.txt and about 5GB for ecoli10000.txt.)

How do I determine how much memory I have? On Mac, select About this Mac from the Apple menubar. You can also observe the CPU and memory usage using the Activity Monitor. On Windows, type Ctrl-Shift-Escape, and look at the last tab so see CPU and memory usage.

Can I assume that the input characters will always be A, C, G or T? NO! Your program should work equally well for any letter, upper case or lower case.

It seems strange to define x[i..M] to be the substring consisting of x[i] through x[M-1] instead of x[i] through x[M]. Is this a typo? It's a little strange, but no, it's not a typo. It's consistent with Java's indexing notation where the left endpoint is inclusive and the right endpoint is exclusive.

Which alignment should I print out if there are two or more optimal ones? Output any one you like.

Testing: Sequence Alignment

Testing.   To help you check the part of your program that finds the edit distance, the edit distances of gene57.txt, stx1230.txt, and ftsa1272.txt are 8, 521, and 758, respectively.

To help you check the part of your program that generates the alignment, there are several short test files in the sequence directory whose alignments are easy to generate manually.

In addition, we require that you generate one input file of your own to be used for testing special cases. Create a new input file with some interesting property. Then test your code using your file and make sure your program behaves as expected.

Possible Progress Steps: Match Sequence Alignment

These are purely suggestions for how you might make progress. You do not have to follow these steps.

  1. Download DataPair.java and Path.java Download and unzip sequence.zip to your system. It contains sample data files.

  2. Create a skeleton Match.java file including the simple test routing below. Make sure that the DataPair class is declared after the closing brace of the Match class or you will get some very strange compile errors.
    public class Match { public static Path match(DataPair dp) { return null; // placeholder so Match will compile } public static void main(String[] args) { TestMatch tm = new TestMatch(); Path path = Match.match(tm); for (Path p = path; p != null; p = p.next) System.out.println(p.row + " " + p.col); } } class TestMatch implements DataPair { public int rows() { return 10; } public int cols() { return 10; } public int penalty(int row, int col, int dir) { return basePenalty(row, col); } public int basePenalty(int row, int col) { return Math.abs(row - col); } }

  3. Copy the body of Match.match() from your Sequence Alignment homework into the body of the new Match.match(). Change all references to a.length() + 1 and b.length() + 1 to dp.rows() and dp.cols(). Change all penalty computations to use dp.penalty() and dp.basePenalty(). Make sure to update all your base cases as well as the main loop!

  4. Test Match.match() using the TestMatch class and main method from the template above. TestMatch is a very simple implementation of the DataPair interface that yields a 10x10 matrix where the penalty of matching a row i and column j is the absolute value of the difference between i and j. In particular, the penalty of matching along the diagonal from upper-left to lower-right is 0, and the penalty of matching anywhere else in the matrix is larger than zero. So Match.match() should return a path that follows this diagonal, and your output should be:
    % java Match
    0 0
    1 1
    2 2
    3 3
    4 4
    5 5
    6 6
    7 7
    8 8
    9 9
    

    This test is a sanity check that Match.match() is doing something reasonable, but is too simple to guarantee everything works. You may want to modify TestMatch or create additional test classes to create more complex test cases as well. (Try making test cases with unequal numbers of rows and columns, and with different values in the bottom row and last column.) Since these classes and Match.main() exist only for testing purposes, you should leave them and any debug output in them when you submit Match.java. However you should remove any print statements from Match.match() that you add for debugging purposes.

  5. Now it is time to update EditDistance.java. Create private String instance variables for the two strings you are matching, as well as any other data you need to keep track of. Implement the basePenalty() method that deals with the lower-right corner of the match matrix, and the penalty() method for the rest of the matrix. Since the lower-right corner of the match matrix corresponds to matching a gap to the gap, EditDistance.basePenalty() should return 0 in that case. EditDistance.penalty() should return 0, 1, or 2. Also, remember that the last row and column of the match matrix correspond to gaps, so it is possible that Match.match() will call penalty(i, j, dir) with values of i and j that are equal to a.length() or b.length(). EditDistance.penalty() will need to handle this case properly. There is more than one way of doing so, and you may use any reasonable approach.

Frequently Asked Questions: Image Stitching

Do I need to include a main method in ImageSeam? You do not have to include a main method, but if you write one for testing purposes, you should not remove it. You should remove any debugging-related code from the other methods however.

How robust should ImageSeam be to invalid input? The ImageSeam constructor and getStitched methods should handle invalid pictures as described in the assignment. However the other methods may assume they will only be called if the data is valid.

Possible Progress Steps: Image Stitching

These are purely suggestions for how you might make progress. You do not have to follow these steps.

  1. Complete Part 1 of the assignment. Image stitching is conceptually easier than sequence alignment and involves a simpler cost function, but it's harder to debug because the intermediate and final results are much harder to evaluate by hand.

  2. Implement rows(), cols(), baseCost(), cost(), and the constructor.

  3. Write a main method that creates two 10x10 pictures. Set all pixels of the first picture to Color.BLACK. Set all pixels of the second picture to Color.WHITE except the principal diagonal (the diagonal running from the upper-left to lower-right corners). Set the pixels along that diagonal to Color.BLACK. Create an ImageSeam object using these two images, and print out the path computed in the constructor. The path should follow the principal diagonal where the two pictures are both black.

  4. Implement getStitched(). This is the trickiest method in ImageSeam. Remember that the first node in the path will always be at row 0 and column 0. The next node in the current path will always be one to the right, one below, or one to the right and below the current one.

  5. Update main() to call getStitched() and display the result. You should get an image with one white triangle and one black triangle.

  6. Test the image examples from the assignment using Stitcher. Make sure you have not mixed up which image getStitched() uses for the portion to the left of the seam, and which portion it uses to the right of the seam.

Enrichment: Image Stitching