CIT 594, Notes on Huffman encoding/decoding
When you decode your file, instead of (or as well as) your output being separated into lines, you may see an additional character at the end of each line. That extra character does not belong there. Where did it come from?
Line separators are implemented differently on different platforms. There are
two ASCII characters,
CR (carriage return) and
(line feed), which may be used.
The newline character,
'\n', represents the UNIX end of line,
that is, the
LF character. If you say
you get what you asked for, that is, an
LF character; if
System.out.println(), the actual character or characters
that are written depends on your operating system. This is why you should prefer
In the Huffman program, the simplest and best solution is: Don't read a
line at a time. Instead, read a character at a time. That way, the
LF characters are treated just the same as every other character,
and you don't have to do anything special about them. As long as you are using
a buffered reader, a large block of data is read in (to a "buffer), and
"reading" a character just means getting the next character from the
buffer, or refilling the buffer as necessary. Reading one character at a time
will not significantly affect the execution time (that's the whole point
If you really want to read whole lines, using
LineReader or something
similar, it might help to know that
should return the line separator for the system you are running on.
When Java writes a Object to an ObjectStream, it also writes out information
that identifies the object ("This is a BitVector with fields numberOfBits...").
I'm not sure just how much information it writes out, but in any case it varies
with the complexity of the class definition. That means you cannot (easily)
compute exactly how many bytes are going to be on the file. For the BitVector,
you can approximate the size (in bytes) as
When you compute the size of your code table, you should be able to figure
out approximately how much data is going to be written out (An
is 4 bytes, a
long is 8 bytes, etc.) You don't need to get an exact
figure, and you certainly don't need to figure out how much additional identifying
information that Java will write out.
It is possible to ask for the size of a file, but I don't know if it is possible
when the file is open and being written. (If it is possible, be sure
flush() before getting the file size.)
If someone knows a great way to compute file sizes beforehand, I would be glad to hear about it (but it will be too late to put in the current assignment).
Measuring execution time
Do not include in your measure the amount of time the user takes in selecting files!
The best way of doing this problem is:
When you do it this way, step 3 is the part whose execution time should be measured. You don't want to measure the time the user spends mousing around and choosing files.
If you do things in some more mixed-up order, you may have to measure the time for two or more parts of the program and add them together to get the final "translation" time.
If you have forgotten how to measure execution time, refer back to the Fibonacci assignment.