CIS 542 - Spring 2012

Lab Assignment #4

February 16, 2012

Due: February 23, 4:30pm

In this assignment, you will attempt to improve and then measure the performance (in terms of execution time, code size, and memory usage) of a variety of applications.

As always, you may work with one other person on this assignment, but you each must turn in your own homework submission, which must include your own writeup, except as noted.

Before you begin
This assignment assumes you have completed lab assignment #3. If you have not yet done so, finish that one first before proceeding to this one.

Also, if you no longer have them, download the text of Tolstoy's "War & Peace" and "Anna Karenina" (retrieved from Project Gutenberg).

Part 1 (10 points)
In this part of the assignment, you will profile your three solutions to the "compare docs" problem: the simple hashtable solution (lab #2 part 4), the self-resizing hashtable solution (lab #2 part 5), and the multi-threaded solution (lab #3 part 2). For each of the three programs, do the following:

In your writeup, include the relevant output for each of the three tools (Massif, Callgrind, and gprof) for each of the three programs. Then answer the following questions:


Part 2 (20 points)
Credit card companies use built-in "checksums" as added security measures when creating the account numbers on credit cards. This means that there are only certain valid credit card numbers, and validity can instantly be detected by using an algorithm that may involve adding up parts of the numbers or performing other checks.

In this assignment, you will improve the performance of a piece of code that applies the following security checks:

For instance, the credit card number "4111111301411066" passes all these tests.

Note that this algorithm is purely made up; don't try to use it to create fake credit card numbers! :-)

The program creditcard.c contains functions that implement this algorithm for a 16-digit credit card number, and also contains a main function that tests the validity of a sample credit card number. However, this code is quite inefficient and has a lot of room for improvement.

First measure the amount of time it takes for this program to run the tests one million times (it should take about 15 seconds to a minute, depending on processor power and system load).

Then apply the code optimization techniques discussed in lecture and attempt to reduce the execution time. Note that you can change anything you want (except for the definition of the CreditCard struct), as long as your code still executes all the tests one million times, and still indicates that the credit card is valid.

In your write-up, you will need to document all the changes you made, and justify each. So keep notes as you go along, i.e. don't just make tons of changes and then hope that you'll be able to go back and remember them later. If you are working with someone else on this part of the assignment, it is okay if you both submit the same writeup.

Five extra points will be given to the group(s) that show the measurably best performance improvement!


Part 3 (20 points)
Memoization (yes, that's really how you spell it) is a technique in which a function remembers (or "memoizes"... seriously) the results of previously-processed inputs, rather than re-executing complicated code that it has already run. In this part of the assignment, you will attempt to improve the performance of a (somewhat) complex mathematical function by using memoization, and then measuring whether it is effective.

The program earth.c randomly generates two latitude/longitude coordinate points and then calculates the distance between them using the spherical law of cosines. It then repeats this (with random points) two hundred million times.

First measure the amount of time it takes for the code to run. On eniac, it should take about a minute.

Then implement memoization, by doing the following:

Now run the program again and measure the time it takes to run. Did memoization make a difference? Under what situations does memoization improve performance? Under what situations does it actually hurt performance? Analyze your results in your writeup.

NEW INFO! To make the assignment a little more tenable, change line 11 of earth.c to "p->lat = random() % 91;" and line 12 to "p->lon = random() % 181;" so that lat and lon are only positive. This will increase the likelihood of generating the same pair of points more than once.

Then change line 47 so that the number of iterations is a variable, and answer this question in your writeup: "How large does that number need to be such that it makes more sense to do memoization than it does to just compute the distance each time? Why?"

The original implementation of the program uses no heap memory; your memoized version most certainly will. Use Massif to analyze the memory usage of your program; does the increase in performance outweigh the increase in memory usage? Explain why/why not. In your writeup, explain how you might decrease the memory usage but still gain some sort of performance increase.

Academic Honesty
As in the previous lab assignments, you may work with one other student on this assignment (even when you are outside the lab). However, you may not discuss or share solutions with any other students, nor should you be receiving any help from outside sources, including students not taking this course or online resources. If you run into problems, please ask a member of the teaching staff for help. Failure to abide by the academic honesty guidelines will result in a grade of 0 for this assignment.

Note that each student must do his/her own writeup; copying your partner's writeup verbatim and submitting it as your own will be considered academic dishonesty.

This assignment is due at 4:30pm on Thursday, February 23. After that, late submissions will be subject to a 10% per day deduction.

All students, regardless of whether they are working alone or with someone else, must submit their solutions individually. For this lab assignment, you should submit your source code for the improved credit card checker in Part 2, and the version of earth.c that uses memoization from Part 3; you should also have a Makefile to compile both. You will also have a writeup, which should consist of a single plain-text or PDF file in which you answer the questions posed in each part; indicate the name of the student with whom you worked, if applicable (note that you may work with a maximum of one other student). Again, please be sure that you do your own writeup; you can discuss your findings with your partner, but each of you should write your own document analyzing your results.

Please be sure to follow the general instructions on the Course Overview page, paying special attention to naming conventions and file organization. Failure to properly follow the submission instructions could result in a delay of grading your assignment and/or a lateness penalty, so please be sure to do it correctly!

Updated: Mon Feb 13, 6:36pm