Homework 8: LC-3 Disassembler
CSE240 - Introduction to Computer Architecture
Autumn 2004

Due: Wednesday, Nov 24 at 11:59PM

Now that you've mastered the art of assembly language programming (and no doubt improved your Tetris-playing skills!), let's simplify our lives and use a "high-level" programming language, C. Although the C compiler will manage details like registers use, C still gives the programmer great control over the manipulation of data. For example, a C program can easily manipulate the binary representation of a program, which is exactly what we will do in this assignment!

You will write a disassembler (let's call it lc3dis). While an assembler (such as lc3as) converts ASCII assembly programs (i.e., .asm files) to binary machine language programs (i.e., .obj files), a disassembler does the reverse. In fact, to test your disassembler to ensure that it works properly, you may want to use your lc3dis to disassemble .obj files into .asm files, which you can then assemble (with lc3as) to create new .obj files. If the original and new .obj files are identical, your lc3dis worked as you would expect! Note that we describe a specific testing strategy later in this document.

Important note: Your disassembler only needs to deal with LC-3 instructions defined in the textbook and the MUL instruction. You do NOT need to deal with the other instructions that we have added (e.g., RTT and JMPT).

Functions

Just as with your Pentris code, we have broken the task at hand into several manageable pieces (in this case C functions).

Function: main()

This is the entry point into the disassembler. It does the following. The code for this function is pretty simple, so we provide it.

Function: get_zext_field(int bits, int hi_bit, int lo_bit)

This function gets the value of the bit field in integer bits beginning with bit hi_bit and ending with bit lo_bit. The resulting value is zero-extended. For example, to get the opcode of an instruction in ir, we would call this function as follows.

    opcode = get_zext_field(ir, 15, 12)
Note that hi_bit and lo_bit are zero-based (i.e., they must be between 0 and 15). This code is really quite tricky, so we've provided it for you. Please look at the code and try to understand the logic.

Function: get_sext_field(int bits, int hi_bit, int lo_bit)

This function is very similar to get_zext_field(), except that it sign extends the resulting field. You will want to use get_zext_field() to select unsigned values like opcodes or register fields (e.g., DR, SR1, etc.), but you will want to use get_sext_field() to select signed immediate fields (e.g., imm5) or signed PC offset fields (e.g., PCoffset9). This code is also tricky. We provide this code, but take a look at it in order to understand it.

Function: get_bit(int bits, int bit_number)

This function is similar to get_zext_field() except that it selects and returns a single zero-extended bit. In fact, it's implemented by calling get_zext_field() with hi_bit and lo_bit set to the same value (bit_number). We provide this code.

Function: get_word_from_file(FILE* f)

This function extracts the next 16-bit word from the input file. We provide this code.

Function: print_instruction(int ir)

This is the core of the disassembler. This function is passed an integer (ir) that may have a value from 0x0000 to 0xffff, representing an LC-3 instruction. This function calls get_zext_field() to extract the opcode from the instruction. It then switches on that opcode. Within the switch there is a case for each opcode (e.g., ADD, AND, BR, JMP, etc.). Each case examines additional instruction bits (determined by the opcode) and prints an appropriate string representing the instruction.

For example, in the case for the ADD instruction, we must call get_zext_field(ir,11,9) to get the destination register and get_zext_field(ir,8,6) to get the first source operand register. Next it must examine bit 5 (via get_bit(ir,5)) in order to determine whether the final operand is an immediate or register. If bit 5 is 0 (i.e., register operand), we call get_zext_field(ir,4,3) and we check that the result is 0 (i.e., bits 4 and 3 are 0). If bits 4 and 3 are not 0, this is not a legal ADD instruction, so we call print_fill(ir) to generate a .FILL assembler directive for this word. Otherwise, we use get_zext_field(ir,2,0) to get the second source operand register. Finally, the ADD assembly instruction is printed via printf(). If bit 5 is 1, we use get_sext_field(ir,4,0) to get the imm5 field, and we print the ADD instruction. Some of this code is provided to get you started.

Function: print_fill(int ir)

This function prints a .FILL assembler directive. We provide this code.

Helpful Details

Testing

We will provide a number of .asm files you can use to test your disassembler (but your should also generate your own test cases). First, assemble each of these files with lc3as. Then disassemble them with your lc3dis. Save the output of lc3dis in a file via redirection.
    lc3as t1.asm
    ./lc3dis t1.obj >newt1.asm
Now in order to confirm that your code is correct use the Unix diff utility to compare t1.asm and newt1.asm.
    diff -w -i t1.asm newt1.asm
If diff produces no output, the two files are the same. Note that -w instructs diff to ignore whitespace and -i instructs it to ignore case. If the files are different, diff will indicate how they are different (type "man diff" for more information on diff). We will be using this testing method for our automatic testing scripts, so make sure diff produces no output.

Submission

Please submit your code in a file called lc3dis.c in the usual way. As in previous assignments, turnin will only work on eniac-s.seas.upenn.edu.

    ssh eniac-s.seas.upenn.edu
If prompted, enter your eniac password. Then...
    cd cse240hw8
    turnin -c cse240 -p HW8 lc3dis.c
    exit

Due Date

Note that this assignment is due the Wednesday before Thanksgiving break. Given that this assignment only requires an addition 70 lines of code, we could have made it due on Monday. But we decided to give you a little flexibility. We suspect many of you will want to turn it in on Monday or Tuesday, so you are not working on it right before break. As always, early submissions are fine!