In this assignment, you will attempt to improve and then measure the performance (in terms of execution time, code size, and memory usage) of a variety of applications.

As always, you may work with **one** other person on this assignment, but you each must turn in your own homework submission, which must include your own writeup, except as noted.

__Before you begin__

This assignment assumes you have completed lab assignment #3. If you have not yet done so, finish that one first before proceeding to this one.

Also, if you no longer have them, download the text of Tolstoy's "War & Peace" and "Anna Karenina" (retrieved from Project Gutenberg).

__Part 1 (10 points)__

In this part of the assignment, you will profile your three solutions to the "compare docs" problem: the simple hashtable solution (lab #2 part 4), the self-resizing hashtable solution (lab #2 part 5), and the multi-threaded solution (lab #3 part 2). For each of the three programs, do the following:

- measure the memory (heap) usage using Massif
- estimate the number of instructions executed using Callgrind
- use gprof to evaluate the number of function calls and the time spent in each function

- does any of the three solutions seem to be clearly "best", in terms of having the shortest execution time and the smallest amount of memory usage?
- if not, explain whether you think a decrease in execution time justifies an increase in memory usage
- based on the report from gprof, which function in your code do you think needs to be "optimized" for performance?

__Part 2 (20 points)__

Credit card companies use built-in "checksums" as added security measures when creating the account numbers on credit cards. This means that there are only certain valid credit card numbers, and validity can instantly be detected by using an algorithm that may involve adding up parts of the numbers or performing other checks.

In this assignment, you will improve the performance of a piece of code that applies the following security checks:

- The card's month and year must not be blank.
- The month must be a two-digit string: "01", "02"... "12"
- The year must be a four-digit string representing a year after 1900.
- The card number must not be blank.
- The card number must consist of 16 digits.
- The sum of the 16 digits in the card number must be a power of 2 (i.e., 2, 4, 8, 16, 32, etc.)
- The first four digits of the card number must represent a prime number
- If you treat digits 7 and 8 (using 0-based index) as a two-digit number, and multiply it by 14, the value should be greater than the number produced if you treat digits 9 and 10 as a two digit number, multiply it by 30, and subtract 1.
- If you treat the last two digits as a hexadecimal (base-16) number, it should represent a value that is either two, four, or eight less than the (base-10) number represented by digits 11-13 (using a 0-based index)

Note that this algorithm is purely made up; don't try to use it to create fake credit card numbers! :-)

The program creditcard.c contains functions that implement this algorithm for a 16-digit credit card number, and also contains a main function that tests the validity of a sample credit card number. However, this code is quite inefficient and has a lot of room for improvement.

First measure the amount of time it takes for this program to run the tests one million times (it should take about 15 seconds to a minute, depending on processor power and system load).

Then apply the code optimization techniques discussed in lecture and attempt to reduce the execution time. Note that you can change anything you want (except for the definition of the CreditCard struct), as long as your code still executes all the tests one million times, and still indicates that the credit card is valid.

In your write-up, you will need to document all the changes you made, and justify each. So keep notes as you go along, i.e. don't just make tons of changes and then hope that you'll be able to go back and remember them later. If you are working with someone else on this part of the assignment, it is okay if you both submit the same writeup.

Five extra points will be given to the group(s) that show the measurably best performance improvement!

__Part 3 (20 points)__

Memoization (yes, that's really how you spell it) is a technique in which a function remembers (or "memoizes"... seriously) the results of previously-processed inputs, rather than re-executing complicated code that it has already run. In this part of the assignment, you will attempt to improve the performance of a (somewhat) complex mathematical function by using memoization, and then measuring whether it is effective.

The program earth.c randomly generates two latitude/longitude coordinate points and then calculates the distance between them using the spherical law of cosines. It then repeats this (with random points) two hundred million times.

First measure the amount of time it takes for the code to run. On eniac, it should take about a minute.

Then implement memoization, by doing the following:

- modify one of your hashtable implementations from the previous labs (choose whichever you think is best, though note that this program is single-threaded) so that a "node" contains a "key" (which refers to the function input) and a "value" (which refers to its memoized output)
- modify the "calculate_distance" function so that it uses your new hashtable implementation to determine whether the inputs have already been seen before; if so, it should return the memoized output and not perform the calculation
- if the inputs have
*not*been seen before, perform the calculation, and then put the inputs and the corresponding output in the hashtable

**
NEW INFO! To make the assignment a little more tenable, change line 11 of earth.c to "p->lat = random() % 91;" and line 12 to "p->lon = random() % 181;" so that lat and lon are only positive. This will increase the likelihood of generating the same pair of points more than once.
**

**
Then change line 47 so that the number of iterations is a variable, and answer this question in your writeup: "How large does that number need to be such that it makes more sense to do memoization than it does to just compute the distance each time? Why?"
**

The original implementation of the program uses no heap memory; your memoized version most certainly will. Use Massif to analyze the memory usage of your program; does the increase in performance outweigh the increase in memory usage? Explain why/why not. In your writeup, explain how you might decrease the memory usage but still gain some sort of performance increase.

**Academic Honesty**

As in the previous lab assignments, you may work with one other student on this assignment (even when you are outside the lab). However, you may not discuss or share solutions with any other students, nor should you be receiving any help from outside sources, including students not taking this course or online resources. If you run into problems, please ask a member of the teaching staff for help. Failure to abide by the academic honesty guidelines will result in a grade of 0 for this assignment.

Note that **each student must do his/her own writeup**; copying your partner's writeup verbatim and submitting it as your own will be considered academic dishonesty.

**Submission**

This assignment is due at 4:30pm on Thursday, February 23. After that, late submissions will be subject to a 10% per day deduction.

All students, regardless of whether they are working alone or with someone else, must submit their solutions individually. For this lab assignment, you should submit your source code for the improved credit card checker in Part 2, and the version of earth.c that uses memoization from Part 3; you should also have a Makefile to compile both. You will also have a writeup, which should consist of a single plain-text or PDF file in which you answer the questions posed in each part; indicate the name of the student with whom you worked, if applicable (note that you may work with a maximum of one other student). Again, please be sure that you do **your own writeup**; you can discuss your findings with your partner, but each of you should write your own document analyzing your results.

Please be sure to follow the general instructions on the Course Overview page, paying special attention to naming conventions and file organization. Failure to properly follow the submission instructions could result in a delay of grading your assignment and/or a lateness penalty, so please be sure to do it correctly!

*Updated: Mon Feb 13, 6:36pm*