Homework 8: LC-3 Disassembler
CSE240 - Introduction to Computer Architecture
Autumn 2005
Due: Wednesday, Nov 23 at 11:59PM
Now that you've mastered the art of assembly language programming (and
no doubt improved your Breakout-playing skills!), let's simplify our lives
and use a "high-level" programming language, C. Although the C compiler
will manage details like registers use, C still gives the programmer
considerable control over the manipulation of data. For example, a C program
can easily manipulate the binary representation of a program, which is
exactly what we will do in this assignment!
You will write a disassembler (let's call it lc3dis). While an
assembler (such as the as command in the simulator) converts
ASCII assembly programs (i.e., .asm files) to binary
machine language programs (i.e., .obj files), a
disassembler does the reverse.
Important note: Your disassembler only needs to deal with LC-3
instructions defined in the textbook and the MUL instruction.
You do NOT need to deal with the other instructions that we have added
(e.g., RTT and JMPT).
Functions
Just as with your Breakout code, we have broken the task at hand into
several manageable pieces (in this case C functions).
Function: main()
This is the entry point into the disassembler. It does the following.
- Opens (for reading) a file specified on the command line.
- Reads the first 2 bytes from the file to determine the .ORIG
address (more on this later).
- Outputs a .ORIG assembler directive with the address computed
above.
- Until the end-of-file is reached, reads each 2-byte instruction from
the file and calls print_instruction() on it.
- Outputs a .END assembler directive.
- Closes the file.
The code for this function is pretty simple, so we provide it.
Function: get_zext_field(int bits, int hi_bit, int lo_bit)
This function gets the value of the bit field in integer bits
beginning with bit hi_bit and ending with bit lo_bit.
The resulting value is zero-extended. For example, to get the opcode of
an instruction in ir, we would call this function as
follows.
opcode = get_zext_field(ir, 15, 12)
Note that hi_bit and lo_bit are zero-based
(i.e., they must be between 0 and 15). This code is really quite
tricky, so we've provided it for you. Please look at the code and try
to understand the logic.
Function: get_sext_field(int bits, int hi_bit, int lo_bit)
This function is very similar to get_zext_field(), except that
it sign extends the resulting field. You will want to use
get_zext_field() to select unsigned values like opcodes or
register fields (e.g., DR, SR1, etc.),
but you will want to use get_sext_field() to select signed
immediate fields (e.g., imm5) or signed PC offset fields
(e.g., PCoffset9). This code is also tricky. We
provide this code, but take a look at it in order to understand it.
Function: get_bit(int bits, int bit_number)
This function is similar to get_zext_field() except that it
selects and returns a single zero-extended bit. In fact, it's
implemented by calling get_zext_field() with hi_bit
and lo_bit set to the same value (bit_number). We
provide this code.
Function: get_word_from_file(FILE* f)
This function extracts the next 16-bit word from the input file. We
provide this code.
Function: print_instruction(int pc, int ir)
This is the core of the disassembler. This function is passed two
things: (i) an integer (pc) that may have a value from 0x000 to
0xffff, representing an address in the LC-3 machine and (ii) an integer
(ir) that may have a value from 0x0000 to 0xffff, representing
an LC-3 instruction. The instruction ir is located in memory
at address pc (the pc value is useful for computing
pc-relative addresses). This function calls
get_zext_field() to extract the opcode from the instruction.
It then switches on that opcode. Within the switch there is a case for
each opcode (e.g., ADD, AND, BR,
JMP, etc.). Each case examines additional instruction
bits (determined by the opcode) and prints an appropriate string
representing the instruction.
For example, in the case for the ADD instruction, we must call
get_zext_field(ir,11,9) to get the destination register and
get_zext_field(ir,8,6) to get the first source operand
register. Next it must examine bit 5 (via get_bit(ir,5)) in
order to determine whether the final operand is an immediate or
register. If bit 5 is 0 (i.e., register operand), we call
get_zext_field(ir,4,3) and we check that the result is 0
(i.e., bits 4 and 3 are 0). If bits 4 and 3 are not 0, this is
not a legal ADD instruction, so we call print_fill(ir)
to generate a .FILL assembler directive for this word.
Otherwise, we use get_zext_field(ir,2,0) to get the second
source operand register. Finally, the ADD assembly instruction
is printed via printf(). If bit 5 is 1, we use
get_sext_field(ir,4,0) to get the imm5 field, and we
print the ADD instruction. Some of this code is provided to
get you started.
Function: print_fill(int ir)
This function prints a .FILL assembler directive. We provide
this code.
Helpful Details
- Getting started. We assume most of you will want to work on
eniac-l.seas.upenn.edu. If you have a C compiler on your personal
machine you would like to use, that's fine. But you should confirm your
code compiles and runs on eniac-l because that's where we'll be testing
it.
Begin by creating a directory to work in and copying the files we
provide. These files are available on eniac-l (see below) or they can
be found in hw8.zip.
cd ~
mkdir cse240hw8
cd cse240hw8
cp ~cse240/project/hw/hw8/*
This will give you a bunch of .asm files to use in testing (below).
Also, it will give you a file called lc3dis.c to use as a starting
point.
- Output. Note that the output of your disassembler is not another
file. It simply prints the disassembled instructions to the display.
If you want to redirect the output to a file use > as follows.
Following will place the output of your disassembler in a file called
newfoo.asm.
./lc3dis foo.obj > newfoo.asm
- Multiply. The encoding for multiply does not appear in the book.
It is exactly the same as ADD or AND except the opcode is xD (13 in
decimal). The book describes this as a reserved opcode.
- Resources. Appendix A and the table on the inside back cover of your
textbook will be extremely useful! You will find all answer there!
- Immediate fields. Please output all of your immediate fields in
hexadecimal (rather then decimal). This is necessary so our
automatic testing scripts will not get confused. For example,
the following is fine.
LDR R1, R2, xB
While the following is equivalent to the above, it will not be accepted
by our testing scripts.
LDR R1, R2, #11
- Make sure you check the fixed fields in instructions. For example,
in an ADD immediate instruction, bits 4 and 3 must be 0. If
they are not, it is not an ADD instruction at all. It's not
any instruction, so it must be data. Similarly, in a JMP
instruction, bits 5 to 0 must be 0. And in a NOT instruction,
bits 5 to 0 must be 1. It you discover that you are looking at data
(not an instruction), call print_fill() to generate a
.FILL assembler directive.
- One or more of the n, z, or p fields in a
BR instruction must be set. If none of them are, it is not a
BR instruction (i.e., it must be data and
print_fill() should be called).
- PC-relative offsets. Do not try to generate assembly code that
contains labels! This would make things much harder. Instead, simply
compute addresses from the address of the current instruction (in the
pc variable) and the offset encoded in the instruction. Please
format addresses so that they are always printed with 4 digits (i.e.,
use the following printf() format string: "x%04x"; see
print_fill (above) for an example). For example, for BR
instructions you should print the address of the destination of the
branch resulting in instructions of the following form.
BRn x0210
You should compute address specified by all instructions that use PC-relative addressing in the same way.
- Extraneous text. Please ensure the text your disassembler generates
does not include any extraneous text (e.g., comments). This will break
our testing scripts.
- Capitalization. Don't worry about capitalization in any part of this assignment. For example, the following two instructions are identical.
.fill xABCD
.FILL Xabcd
- Compiling. Use gcc on the Moore 100 machines or
eniac-l.seas.upenn.edu to build your code. You may want to use
the -o flag to specify the name of the generated program.
Here's an example.
gcc -o lc3dis lc3dis.c
./lc3dis foo.obj > newfoo.asm
- Object file format. For the curious, we'll describe the
.obj file format. The first 2 bytes contain the .ORIG
address of the program. Subsequent byte pairs (16 bits) encode each
instruction in the program. Simple, eh?
Testing
We will provide a number of .asm files you can use to test your
disassembler (but your should also generate your own test cases).
First, assemble each of these files with the as command in the
simulator. Then disassemble them with your lc3dis. Save the
output of lc3dis in a file via redirection.
as t1.asm (in simulator)
./lc3dis t1.obj >newt1.asm
Now in order to confirm that your code is correct use the Unix diff
utility to compare t1.asm and newt1.asm.
diff -w -i t1.asm newt1.asm
If diff produces no output, the two files are the same. Note
that -w instructs diff to ignore whitespace and
-i instructs it to ignore case. If the files are different,
diff will indicate how they are different (type "man
diff" for more information on diff).
Note that if your original .asm file contained labels, these
naturally won't appear in the corresponding disassembled code. You'll
have to confirm that the addresses your disassembler generates are
correct.
Also note that the output of the disassembler cannot be directly
assembled because the assembler doesn't know what to do with absolute
addresses (it wants labels).
Important: Please test your disassembler thoroughly with your own
tests. The tests we provide are not at all complete, so you will have
to create your own tests.
Submission
Please submit your code in a file called lc3dis.c in the usual
way on the HW
8 Submission Page. It may take a couple days from the time this homework is assigned for this page to go live.
Due Date
Note that this assignment is due the Wednesday before Thanksgiving
break. Given that this assignment only requires an addition 70 lines
(or so) of code, we could have made it due on Monday. But we decided to
give you a little flexibility. We suspect many of you will want to turn
it in on Monday or Tuesday, so you are not working on it right before
break. As always, early submissions are fine!