Goals

Rigorous Refresher on the basics of C programming (Pointers, output parameters, Memory Model, etc.)
Refresher on dynamic memory allocation
More practice debugging. The tests are more complex and you will need to put in some effort to figure out where you are encountering issues.

We will be grading you on style for this assignment. Though there isn’t that much too it and it mostly comes in the second part of the assignment.

Collaboration

For assignments in CIS 3990, you will complete each of them on your own. However, you may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. If you are worried about whether something violates academic integrity, please post on Ed or contact the instructor(s).

Setup

You can downlowd the starter files into your environment by running the following command. You should run it from inside your git repository since that is how you will submit your code.

curl -o llht_starter.zip https://www.seas.upenn.edu/~tqmcgaha/cis3990/current/projects/code/llht_starter.zip

You can also download the files manually here if you would like: llht_starter.zip

From here, you need to extract the files by running

unzip llht_starter.zip

You should now have a folder called llht that has the starter files. We expect this to be a folder inside the base of your github repository, similar to cowsay and check-setup. If you accidentally put the folder somewhere else, you can move it with the mv command in the terminal.

Starter Files

In the zip you will find the following files:

Part 1 Files:

LinkedList.hpp: Contains the declarations and extensive comments detailing the functions you will have to implement for part 1 of this assignment.
LinkedList_priv.hpp: Contains information about the internal structures used for the LinkedList and its iterator.
LinkedList.cpp: Where you will be writing all of your code for the assignment. You will have to implement all of the functions specified in LinkedList.hpp.
test_linkedlist.cpp: The tests that we will be using to evaluate your LinkedList implementation.

Part 2 Files:

HashTable.hpp: Contains the declarations and extensive comments detailing the functions you will have to implement for part 2 of this assignment.
HashTable_priv.hpp: Contains information about the internal structures used for the HashTable and its iterator.
HashTable.cpp: Where you will be writing all of your code for the assignment. You will have to implement all of the functions specified in HashTable.hpp.
test_hashtable.cpp: The tests that we will be using to evaluate your HashTable implementation.

We recommend reading all of the files for a part before your start that part, especially the .hpp files listed. All of the code you will submit for this assignment should be contained in LinkedList.cpp and HashTable.cpp. This means that your code should work with the other files as they are when initially supplied. Note that you can modify the other files if you like, but we will be testing your LinkedList.cpp and HashTable.cpp against un-modified versions of the files.

Overview

In this assignment we want you to program a data structure in a mostly C style way. This course focuses a lot on C++ but it is important to get the fundamentals of C down since it is the foundation C++ is built on top of, and some of you may have to work in evironments that only use C or write C++ in a very “C” style.

If you’ve programmed in higher level programming languages like Java, Python, or others, then you probably made use of a large standard library, filled with data structure implementations ready for you to use. In C, we have to make data structures ourselves or use an external library. In this assignment, we will be implementing two fairly common data structures: A Linked-List, a HashTable and iterators for each data structure.

However, C is missing many common features of high-level programming languages, which will make this difficult. In particular, we want to make a generic data structure, with a private implementation, that doesn’t have any memory leaks.

When we say we want to make a “generic” data structure, we mean we want to make a data structure that can accept any kind of data to stored into it. In other languages, this could be accomplished with templates, such as how you can have a List\<Integer\> in Java. In C, we are going to use a void * as our element type, and so users can store pointers to information in our structure. This requires the user to remember how to use the types, and possibly allocate space for the data themselves.
We want to keep the structure private, but we don’t have access modifiers like public and private in C. Instead, we decide what is part of the public interface vs what is private by what is included in the .h file. Now we can specify the data type, and some functions that can be used on the type in the public .h file so that the user doesn’t manipulate the structure directly.
Lastly, unlike high-level languages, C doesn’t have garbage collection. We need to manage memory ourselves using the functions malloc() and free() (though since this is C++, we will use new and delete).

This assignment is broken into two parts respectively, first you will implement the LinkedList and its corresponding iterator before moving on to the HashTable and its iterator.

Note that despite this assignment being two parts, most students find that the second part with the HashTable and its iterator takes significantly longer to implement.

Part 1: LinkedList

The linked list will be “doubly linked”, which means each node has both a next pointer and a prev pointer to access both the node before it and after it in the list.

At a high level, doubly-linked lists usually look like:

ll_abstract

Note in here that each node contains the a pointer to the next node, a pointer to the previous node, and the payload (data) that is stored in that node. We also use NULL as a value in the prev and next nodes to indicate the absence of a previous or next node.

However, due our goal of wanting to make a full data structure, we don’t want to just give the users of our code LinkedListNodes and tell them to manage the nodes themselves.

As a result, we are going to have a struct called LinkedList that is dedicated to managing the nodes into the form of a List. LinkedList contains a pointer to the front or “head” of the list, a pointer to the end or “tail” of the list, and the current number of elements in the list. We can update our drawing to look like this:

LinkedList

We also want to provide an iterator so that clients can read all the elements in the list, so we will need a third struct called LLIterator. LLIterator contains a pointer to the current node the iterator is referring to in the list, and a pointer to the LinkedList struct that manages the nodes we are iterating over. The reason for having a reference to the list itself is because we may have to update things in the LinkedList if we end up removing things with the LLIterator, so the num_elements of the LinkedList and possibly other data may have to change when we remove a node with LLIterator_Remove.

If we add our iterator to the drawing, we get:

LLIterator

In summary, there are three struct types that we will be using in this part:

LinkedList: The structure containing data to maintain the LinkedList, such as the current number of elements, a pointer to the first node, and a pointer to the last node.
LinkedListNode: The node structure that we will use for our doubly linked list internals. This has a payload field, and pointers to the next and prev elements in the list.
LLIterator: The structure that represents our iterator. This contains some meta-data necessary for the iterator such as the node it is currently on and a pointer to the overall LinkedList. Note that it is necessary to have a pointer to the overall LinkedList since the iterator can remove elements which may affect the information stored in the LinkedList structure.

We will also be using the type LLPayload_t which is a typedef (e.g. alias) for a void *. As mentioned before, this will allow users some flexibility in what the store in the linked list.

Here we have given an overview of the LinkedList structure, your job will be to implement the functions that suppor the implementaiton of our data structure.

Task 1. Read LinkedList.hpp, LinkedList_priv.hpp and implement the incomplete functions in LinkedList.cpp.

Part 2: HashTable

For part 2, we are implementing a chained hash table. A HashTable allows the user to insert a key/value pair, look up already existing key/value pairs, and remove them. Note that the HashTable is a more complex data structure, and probably something you are not as familiar with. This is OK, this is not a data structures course, please let us know if you have questions on the structure of the HashTable and iterator.

The HashTable structure follows a “chaining” design. This means that our table maintains several LinkedLists which we call “chains” or “buckets”. When a user inserts a key/value pair, the pair will also come with some hash value of type HTHash_t. The hash table will use a deterministic function to associate that HTHash_t with one of the specified buckets. We can then insert that integer into our HashTable by inserting it into one of our “buckets”. If we later want to lookup that key/value pair or remove that key/value pair, we can simply look through the bucket associated with that key and avoid looking through the rest of the buckets in our hash table.

One important case to note though, is that if we insert a key/value pair into our HashTable and there is already an entry with the same key/value pair, we will return a copy of the already existing key-value pair to the caller and replace the old key-value pair in the HashTable with the new key/value pair.

To keep the HashTable quick for inserting/lookups/removals we try to keep the buckets short. So as we add more elements to the HashTable, the HashTable will resize to have more buckets and possibly redistribute which keys are associated with wich bucket to keep the buckets short. This resizing is already implemented for you, we just describe it here to give you an idea of what the code is doing.

To get a better understanding of how the HashTable works, we have a diagram below:

HashTable

Important to note about this diagram is that we are users of the LinkedList data structure. This means that we must use the functions provded in LinkedList.hpp to interact with the LinkedList and don’t have direct access to the “private” representation of the list. Note that you are NOT allowed to #include "LinkedList_priv.hpp" in HashTable.cpp.

We will also be implementing a HashTable Iterator HTIterator (not pictured) that will iterate through all of the buckets of the HashTable, and each element in each bucket.

More details on the HashTable and HTIterator can be found in HashTable.hpp and HashTable_priv.hpp respectively. We HIGHLY recommend you read these header files before beginning part 2.

Short summary of the structs used in part 2:

HTKeyValue_t: Represents a Key-Value pair (and the hash of they key) that we are going to store in our hash table
HashTable: Contains the data necessary to implement the HashTable: The number of elements in the HashTable, an array of LinkedList “buckets” where the elements are stored, the length of the “buckets” array, and a key_cmp_fn which is a function pointer to a function that takes in two HTKey_t and returns a bool that should be true iff the keys are equal. This is used because it is possible for two HTKeyValue_t to have the same hash and diferent keys.
HTIterator: The structure that represents our hash table iterator. Contains a pointer to the HashTable that is currently being iterated over, which bucket in the HashTable we are iterating over, and the iterator for that bucket.

Task 2. Read HashTable.hpp, HashTable_priv.hpp and implement the incomplete functions in HashTable.cpp.

Grading & Testing

Compilation

We have supplied you with a Makefile that can be used for compiling your code into an executable. To do this, open the terminal and then type in make.

You may need to resolve any compiler warnings and compiler errors that show up. Once all compiler errors have been resolved, if you ls in the terminal, you should be able to see an executable called test_suite. You can then run this by typing in ./test_suite to see the evaluation of your code.

Note that your submission will be partially evaluated on the number of compiler warnings. You should eliminate ALL compiler warnings in your code.

Valgrind

We will also test your submission on whether there are any memory errors or memory leaks. We will be using valgrind to do this. To do this, you should try running: valgrind --leak-check=full ./test_suite

If everything is correct, you should see the following towards the bottom of the output:

 ==1620== All heap blocks were freed -- no leaks are possible
 ==1620==
 ==1620== For counts of detected and suppressed errors, rerun with: -v
 ==1620== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If you do not see something similar to the above in your output, valgrind will have printed out details about where the errors and memory leaks occurred.

catch2

As with mentioned above, you can compile the your implementation by using the make command. This will result in several output files, including an executable called test_suite.

After compiling your solution with make, You can run all of the tests for the homwork by invoking:

$ ./test_suite

You can also run only specific tests by passing command line arguments into test_suite

For example, to only run the LinkedList tests, you can type in:

$ ./test_suite [Test_LinkedList]

If you only want to test Push and Pop from LinkedList, you can type in:

$ ./test_suite [Test_LinkedList] PushPop

You can specify which tests are run for any of the tests in the assingment. You just need to know the names of the tests, and you can do this by running:

$ ./test_suite --list-tests

These settings can be helpful for debugging specific parts of the assignment, especially since test_suite can be run with these settings through valgrind and gdb!

Code Quality

For this assignment, we will be checking the quality of your code in two ways:

Through automated checks with clang-format & clang-tidy
Through manually reviewing your code.

Clang Format and Clang Tidy

The makefile we provided with this assignment is configured to help make sure your code is properly formatted and follows good (modern) C++ coding conventions.

To do this, we make use of two tools: clang-format and clang-tidy

To make sure your code identation is nice, we have clang-format. All you need to do to use this utility, is to run make format, which will run the tool and indent your code propely. Code that is turned in is expected to follow the specified style, code that is ill-formated must be fixed and re-submitted.

clang-tidy is a more complicated tool. Part of it is checking for style and readability issues but that is not all it does. It also checks for and can detect possible bugs in your program.

Because of all this, we are enforcing that your code does not produce any clang-tidy errors. You can run clang-tidy on your code by running: make tidy-check. Note that you will have to fix any compiler errors before clang-tidy will run (and be useful).

Code that has clang-format, clang-tidy or any compiler errors will face a deduction.

If you have any questions about understanding an error, please ask on Ed discussion and we will happily assist you.

Manual Checks

For this assignment we will also be manually checking your code for code quality issues.

We have a code quality document that outlines what we are looking for, and the automated checks described above will be able to find most of these. However some of these will be manually checked.

Since this is the first time we are grading on code quality manually and since this is C-styled code, we are not requiring you to follow all parts of the style document. We will only be looking for issues that fall under the following categories:

Naming
Commenting
Modules
Design
- with the exception of prefer_range_for
Functions
- Most notable for this HW: functions::avoid_repetitive_code
performance::avoid_unecessary_allocation
performance::avoid_repeated_computation

Reminder that you can re-open assignments, so if you do not get the score you want on code quality, you can fix it and re-submit the assignment at a later point. More details on this are described in the syllabus

Another reminder that your style grade will be “bucketed” into one of four scor categories. Again, see the syllabus for more details.

Submission

Please submit your completed LinkedList.cpp and HashTable.cpp to Gradescope via your github repo. We will be checking for compilation warnings, valgrind errors, making sure the test_suite runs correctly, the quality of your code, and that you did not violate any rules established in the specification.