Homework 8: LC-3 Disassembler
CSE240 - Introduction to Computer Architecture
Autumn 2005

Due: Wednesday, Nov 23 at 11:59PM

Now that you've mastered the art of assembly language programming (and no doubt improved your Breakout-playing skills!), let's simplify our lives and use a "high-level" programming language, C. Although the C compiler will manage details like registers use, C still gives the programmer considerable control over the manipulation of data. For example, a C program can easily manipulate the binary representation of a program, which is exactly what we will do in this assignment!

You will write a disassembler (let's call it lc3dis). While an assembler (such as the as command in the simulator) converts ASCII assembly programs (i.e., .asm files) to binary machine language programs (i.e., .obj files), a disassembler does the reverse.

Important note: Your disassembler only needs to deal with LC-3 instructions defined in the textbook and the MUL instruction. You do NOT need to deal with the other instructions that we have added (e.g., RTT and JMPT).

Functions

Just as with your Breakout code, we have broken the task at hand into several manageable pieces (in this case C functions).

Function: main()

This is the entry point into the disassembler. It does the following. The code for this function is pretty simple, so we provide it.

Function: get_zext_field(int bits, int hi_bit, int lo_bit)

This function gets the value of the bit field in integer bits beginning with bit hi_bit and ending with bit lo_bit. The resulting value is zero-extended. For example, to get the opcode of an instruction in ir, we would call this function as follows.

    opcode = get_zext_field(ir, 15, 12)
Note that hi_bit and lo_bit are zero-based (i.e., they must be between 0 and 15). This code is really quite tricky, so we've provided it for you. Please look at the code and try to understand the logic.

Function: get_sext_field(int bits, int hi_bit, int lo_bit)

This function is very similar to get_zext_field(), except that it sign extends the resulting field. You will want to use get_zext_field() to select unsigned values like opcodes or register fields (e.g., DR, SR1, etc.), but you will want to use get_sext_field() to select signed immediate fields (e.g., imm5) or signed PC offset fields (e.g., PCoffset9). This code is also tricky. We provide this code, but take a look at it in order to understand it.

Function: get_bit(int bits, int bit_number)

This function is similar to get_zext_field() except that it selects and returns a single zero-extended bit. In fact, it's implemented by calling get_zext_field() with hi_bit and lo_bit set to the same value (bit_number). We provide this code.

Function: get_word_from_file(FILE* f)

This function extracts the next 16-bit word from the input file. We provide this code.

Function: print_instruction(int pc, int ir)

This is the core of the disassembler. This function is passed two things: (i) an integer (pc) that may have a value from 0x000 to 0xffff, representing an address in the LC-3 machine and (ii) an integer (ir) that may have a value from 0x0000 to 0xffff, representing an LC-3 instruction. The instruction ir is located in memory at address pc (the pc value is useful for computing pc-relative addresses). This function calls get_zext_field() to extract the opcode from the instruction. It then switches on that opcode. Within the switch there is a case for each opcode (e.g., ADD, AND, BR, JMP, etc.). Each case examines additional instruction bits (determined by the opcode) and prints an appropriate string representing the instruction.

For example, in the case for the ADD instruction, we must call get_zext_field(ir,11,9) to get the destination register and get_zext_field(ir,8,6) to get the first source operand register. Next it must examine bit 5 (via get_bit(ir,5)) in order to determine whether the final operand is an immediate or register. If bit 5 is 0 (i.e., register operand), we call get_zext_field(ir,4,3) and we check that the result is 0 (i.e., bits 4 and 3 are 0). If bits 4 and 3 are not 0, this is not a legal ADD instruction, so we call print_fill(ir) to generate a .FILL assembler directive for this word. Otherwise, we use get_zext_field(ir,2,0) to get the second source operand register. Finally, the ADD assembly instruction is printed via printf(). If bit 5 is 1, we use get_sext_field(ir,4,0) to get the imm5 field, and we print the ADD instruction. Some of this code is provided to get you started.

Function: print_fill(int ir)

This function prints a .FILL assembler directive. We provide this code.

Helpful Details

Testing

We will provide a number of .asm files you can use to test your disassembler (but your should also generate your own test cases). First, assemble each of these files with the as command in the simulator. Then disassemble them with your lc3dis. Save the output of lc3dis in a file via redirection.
    as t1.asm (in simulator)
    ./lc3dis t1.obj >newt1.asm
Now in order to confirm that your code is correct use the Unix diff utility to compare t1.asm and newt1.asm.
    diff -w -i t1.asm newt1.asm
If diff produces no output, the two files are the same. Note that -w instructs diff to ignore whitespace and -i instructs it to ignore case. If the files are different, diff will indicate how they are different (type "man diff" for more information on diff).

Note that if your original .asm file contained labels, these naturally won't appear in the corresponding disassembled code. You'll have to confirm that the addresses your disassembler generates are correct.

Also note that the output of the disassembler cannot be directly assembled because the assembler doesn't know what to do with absolute addresses (it wants labels).

Important: Please test your disassembler thoroughly with your own tests. The tests we provide are not at all complete, so you will have to create your own tests.

Submission

Please submit your code in a file called lc3dis.c in the usual way on the HW 8 Submission Page. It may take a couple days from the time this homework is assigned for this page to go live.

Due Date

Note that this assignment is due the Wednesday before Thanksgiving break. Given that this assignment only requires an addition 70 lines (or so) of code, we could have made it due on Monday. But we decided to give you a little flexibility. We suspect many of you will want to turn it in on Monday or Tuesday, so you are not working on it right before break. As always, early submissions are fine!