Hamming Codes in TOY

[an error occurred while processing this directive]

CIS 110

Hamming Codes in TOY

Programming Assignment

Goals

Write a TOY program to encode data using Hamming Codes
Write a TOY program to decode data that has been corrupted using Hamming Codes

Submission

Submit encode.toy, decode.toy, and a completed readme_hamming.txt file using the submission link in the sidebar.

Error-correcting codes enable data to be sent through a noisy communication channel without corruption. To accomplish this, the sender appends redundant information to the message, so that even if some of the original data is corrupted during transmission, the receiver can still recover the original message intact. Transmission errors are common and can arise from scratches on a CD, static on a cell phone, or atmospheric interference. In a noisy environment, error-correcting codes can increase the throughput of a communication link since there is no need to retransmit the message if it becomes partially corrupted during transmission. For this reason, error-correcting codes are used in many common systems including: storage devices (CD, DVD, DRAM), mobile communication (cell phones, wireless, microwave links), digital television, and high-speed modems (ADSL, xDSL).

Hamming Codes - A Hamming code is a specific type of error-correcting code that allows the detection and correction of single-bit transmission errors. Hamming codes are used in many applications where such errors are common, including DRAM memory chips and satellite communication hardware. Hamming codes work by repeatedly reading four message bits, which we denote by m1, m2, m3, m4, and then inserting three parity bits, which we denote by p1, p2, and p3. If any one of these seven bits is corrupted during transmission, the receiver can detect the error and recover the original four message bits intact. This is called single-bit error correction because at most one bit can be corrected per unit of data sent. The overhead for using this method is a 75% increase in bandwidth because it requires three extra parity bits for every four message bits. This compares favorably with the naive approach of sending three copies of each bit (and using the one that appears most frequently), which results in a 200% increase in bandwidth.

Hamming Codes - Visualization - Before we describe the algebra of Hamming codes, we first visualize the calculation of the parity bits using Venn diagrams. As an example, suppose we wish to send the message 1101. We associate each of the four message bits with a specific intersection region of three pairwise overlapping circles, as illustrated below:

The Hamming code adds three parity bits so that each of the three circles has even parity.

That is, the sum of the four bits contained in each of the three circles is even:

In this case, to send the message 1101, we send 1101100 since the three parity bits are 1 (top), 0 (left), and 0 (right).

Now imagine this picture is transmitted over a noisy communication channel, and that one bit is corrupted so that the following picture arrives at the receiving station (corresponding to 1001100):

The receiver discovers that an error occurred by checking the parity of the three circles. Moreover, the receiver can identify where the error occurred (the second bit) and recover the four original message bits!

Since the parity check for the top circle and right circle failed, but the left circle was ok, there is only one bit that could be responsible, and that's m2. If the center bit, m4, is corrupted then all three parity checks will fail. If a parity bit itself is corrupted, then only one parity check will fail. If the communication channel is so noisy that two or more bits are simultaneously corrupted, then the scheme will not work. Can you see why? More sophisticated types of error-correcting codes can and do handle such situations.

Of course, in practice, only the 7 bits are transmitted, rather than the circle diagrams.

Part I: Encoding

Write a TOY program encode.toy to encode a binary message using the scheme described above.

Your program should repeatedly read four bits m1, m2, m3, and m4 from TOY standard input until reading in FFFF.
Write the seven bits m1, m2, m3, m4, p1, p2, p3 to TOY standard output, where
p1 = m1 ^ m2 ^ m4
p2 = m1 ^ m3 ^ m4
p3 = m2 ^ m3 ^ m4
Recall that ^ is the exclusive or operator in Java and TOY. This captures the parity notion described above.
Include a register map in the comments at the top of your program.

Input format. For convenience, we'll transmit each bit individually (as 0000 or 0001), instead of packing 16 bits per TOY word as would be done in a real application. The number of transmitted bits (not including the terminating FFFF) will be a multiple of four when encoding. You do not need to do any error checking on the data files. The data.zip contains a few sample input files for encode.toy.:

% more encode3.txt 
0001 0001 0000 0001 
0001 0001 0001 0000 
0001 0001 0001 0001
FFFF

Output format. The output format should be the same as the input format, i.e. a series of bits encoded as 0000 or 0001 followed by FFFF.

TOY simulator. To execute your TOY program, use either the command-line simulator TOY.java or the graphical Visual X-TOY simulator.

% java TOY encode.toy < encode3.txt
...          
Terminal    
--------------------------------------- 
0001                   
0001                   
0000                   
0001               
0001             
0000                
0000                        
0001                        
0001                        
0001                        
0000                        
0000                        
0000                        
0000                       
0001
0001
0001
0001
0001
0001
0001
FFFF
...

Part II: Decoding

Warning: This part is significantly more challenging than Part I.

Write a TOY program decode.toy to correct a Hamming encoded message.

Your program should repeatedly read seven bits m1, m2, m3, m4, p1, p2, p3 from TOY standard input until reading in FFFF
Determine which one, if any, of the message bits is corrupted by performing the following three parity checks:
1. p1 = m1 ^ m2 ^ m4
2. p2 = m1 ^ m3 ^ m4
3. p3 = m2 ^ m3 ^ m4
Compare the parity bits you computed with the parity bits you received. If they don't match, then some bit got flipped. Here's a summary of what to do with the results:
If exactly zero or one of the parity checks fail, then all four message bits are correct.
If checks 1 and 2 fail (but not check 3), then bit m1 is incorrect.
If checks 1 and 3 fail (but not check 2), then bit m2 is incorrect.
If checks 2 and 3 fail (but not check 1), then bit m3 is incorrect.
If all three checks fail, then bit m4 is incorrect.
Flip the corrupted message bit (if necessary) and write m1, m2, m3, and m4 to standard output.
Include a register map in the comments at the top of your program.

Input format. The input format is the same as before. The number of transmitted bits (not including the terminating FFFF) will be a multiple of seven when decoding. You do not need to do any error checking on the data files. The data.zip contains a few sample input files for decode.toy.:

% more decode3.txt
0001 0000 0000 0001 0001 0000 0000
0000 0001 0001 0000 0000 0000 0000
0001 0001 0001 0001 0001 0001 0000
FFFF

Output format. The output format should be the same as the input format, i.e. a series of bits encoded as 0000 or 0001 followed by FFFF.

TOY simulator. To execute your TOY program, use either the command-line simulator TOY.java or the graphical Visual X-TOY simulator.

% java TOY decode.toy < decode3.txt
...
Terminal
---------------------------------------
0001
0001
0000
0001
0001
0001
0001
0000
0001
0001
0001
0001
FFFF
...

Further Testing

The most complete way to test your TOY programs is to encode and decode all possible inputs. All the files referenced below are also in the data.zip archive.

Use the input file encode16.txt< to debug encode.toy: it contains possible 4-bit inputs. The file encode16.txt.ans contains the answer: for brevity, it displays each bit as 0 or 1 (instead of 0000 or 0001) and groups the bits 7 per line (instead of 1 per line). encode16output.txt contains the output as it should be printed by encode.toy.
Use the input file decode128.txt to debug decode.toy: it contains all possible 7-bit inputs, The file decode128.txt.ans contains the answer: again, for brevity, it displays each bit as a 0 or 1 and groups the bits 4 per line. decode128output.txt has the appropriate output from decode.toy.

Reference solutions. As a special bonus, we provide Java source code HammingEncoder.java and HammingDecoder.java for the two programs. You are welcome to examine this code and use it to develop your TOY programs.

Extra Credit

Rewrite decode.toy using the fewest number of (nonzero) words of TOY main memory. We will award extra credit for any solution that uses 40 (in decimal) or fewer words of memory. Be sure to save your working version before attempting this!