Homework 8: LC-3 Disassembler
CSE240 - Introduction to Computer Architecture
Autumn 2004
Due: Wednesday, Nov 24 at 11:59PM
Now that you've mastered the art of assembly language programming (and
no doubt improved your Tetris-playing skills!), let's simplify our lives
and use a "high-level" programming language, C. Although the C compiler
will manage details like registers use, C still gives the programmer
great control over the manipulation of data. For example, a C program
can easily manipulate the binary representation of a program, which is
exactly what we will do in this assignment!
You will write a disassembler (let's call it lc3dis). While an
assembler (such as lc3as) converts ASCII assembly programs
(i.e., .asm files) to binary machine language programs
(i.e., .obj files), a disassembler does the reverse. In
fact, to test your disassembler to ensure that it works properly, you
may want to use your lc3dis to disassemble .obj files
into .asm files, which you can then assemble (with
lc3as) to create new .obj files. If the original and
new .obj files are identical, your lc3dis worked as
you would expect! Note that we describe a specific testing strategy
later in this document.
Important note: Your disassembler only needs to deal with LC-3
instructions defined in the textbook and the MUL instruction.
You do NOT need to deal with the other instructions that we have added
(e.g., RTT and JMPT).
Functions
Just as with your Pentris code, we have broken the task at hand into
several manageable pieces (in this case C functions).
Function: main()
This is the entry point into the disassembler. It does the following.
- Opens (for reading) a file specified on the command line.
- Reads the first 2 bytes from the file to determine the .ORIG
address (more on this later).
- Outputs a .ORIG assembler directive with the address computed
above.
- Until the end-of-file is reached, reads each 2-byte instruction from
the file and calls print_instruction() on it.
- Outputs a .END assembler directive.
- Closes the file.
The code for this function is pretty simple, so we provide it.
Function: get_zext_field(int bits, int hi_bit, int lo_bit)
This function gets the value of the bit field in integer bits
beginning with bit hi_bit and ending with bit lo_bit.
The resulting value is zero-extended. For example, to get the opcode of
an instruction in ir, we would call this function as
follows.
opcode = get_zext_field(ir, 15, 12)
Note that hi_bit and lo_bit are zero-based
(i.e., they must be between 0 and 15). This code is really quite
tricky, so we've provided it for you. Please look at the code and try
to understand the logic.
Function: get_sext_field(int bits, int hi_bit, int lo_bit)
This function is very similar to get_zext_field(), except that
it sign extends the resulting field. You will want to use
get_zext_field() to select unsigned values like opcodes or
register fields (e.g., DR, SR1, etc.),
but you will want to use get_sext_field() to select signed
immediate fields (e.g., imm5) or signed PC offset fields
(e.g., PCoffset9). This code is also tricky. We
provide this code, but take a look at it in order to understand it.
Function: get_bit(int bits, int bit_number)
This function is similar to get_zext_field() except that it
selects and returns a single zero-extended bit. In fact, it's
implemented by calling get_zext_field() with hi_bit
and lo_bit set to the same value (bit_number). We
provide this code.
Function: get_word_from_file(FILE* f)
This function extracts the next 16-bit word from the input file. We
provide this code.
Function: print_instruction(int ir)
This is the core of the disassembler. This function is passed an
integer (ir) that may have a value from 0x0000 to 0xffff,
representing an LC-3 instruction. This function calls
get_zext_field() to extract the opcode from the instruction.
It then switches on that opcode. Within the switch there is a case for
each opcode (e.g., ADD, AND, BR,
JMP, etc.). Each case examines additional instruction bits
(determined by the opcode) and prints an appropriate string representing
the instruction.
For example, in the case for the ADD instruction, we must call
get_zext_field(ir,11,9) to get the destination register and
get_zext_field(ir,8,6) to get the first source operand
register. Next it must examine bit 5 (via get_bit(ir,5)) in
order to determine whether the final operand is an immediate or
register. If bit 5 is 0 (i.e., register operand), we call
get_zext_field(ir,4,3) and we check that the result is 0
(i.e., bits 4 and 3 are 0). If bits 4 and 3 are not 0, this is
not a legal ADD instruction, so we call print_fill(ir)
to generate a .FILL assembler directive for this word.
Otherwise, we use get_zext_field(ir,2,0) to get the second
source operand register. Finally, the ADD assembly instruction
is printed via printf(). If bit 5 is 1, we use
get_sext_field(ir,4,0) to get the imm5 field, and we
print the ADD instruction. Some of this code is provided to
get you started.
Function: print_fill(int ir)
This function prints a .FILL assembler directive. We provide
this code.
Helpful Details
- Getting started. Begin by creating a directory to work in and copying
the files we provide.
cd ~
mkdir cse240hw8
cd cse240hw8
cp ~cse240/project/hw/hw8/*
This will give you a bunch of .asm files to use in testing (below).
Also, it will give you a file called lc3dis.c to use as a starting
point. You'll want to update your path just as you did in homework 6
(and 7). This will allow you to access lc3as for testing.
- Output. Note that the output of your disassembler is not another file.
It simply prints the disassembled instructions to the display. If you
want to redirect the output to a file use > as follows.
./lc3dis foo.obj > newfoo.asm
- Resources. Appendix A and the table on the inside back cover of your
textbook will be extremely useful! You will find all answer there!
- Immediate fields. Please output all of your immediate fields in
decimal (rather then hexadecimal). This is necessary so our
automatic testing scripts will not get confused. For example,
the following is fine.
LDR R1, R2, #10
While the following is equivalent to the above, it will not be accepted
by our testing scripts.
LDR R1, R2, xA
- Make sure you check the fixed fields in instructions. For example,
in an ADD immediate instruction, bits 4 and 3 must be 0. If
they are not, it is not an ADD instruction at all. It's not
any instruction, so it must be data. Similarly, in a JMP
instruction, bits 5 to 0 must be 0. And in a NOT instruction,
bits 5 to 0 must be 1. It you discover that you are looking at data
(not an instruction), call print_fill() to generate a
.FILL assembler directive.
- One or more of the n, z, or p fields in a
BR instruction must be set. If none of them are, it is not a
BR instruction (i.e., it must be data and
print_fill() should be called).
- PC-relative offsets. Do not try to generate assembly code that
contains labels! This would make things much harder. Instead, simply
specify your PC-relative offsets directly (in base 10, so you can
specify negatives). For example, if the PC-offset of some LD
instruction is -17 (and the destination register is R1), you would
generate the following assembly instruction.
LD R1, #-17
- Compiling. Use gcc on the Moore 100 machines or
eniac-l.seas.upenn.edu to build your code. You may want to use
the -o flag to specify the name of the generated program.
Here's an example.
gcc -o lc3dis lc3dis.c
./lc3dis foo.obj > newfoo.asm
- Object file format. For the curious, we'll describe the
.obj file format. The first 2 bytes contain the .ORIG
address of the program. Subsequent byte pairs (16 bits) encode each
instruction in the program.
Testing
We will provide a number of .asm files you can use to test your
disassembler (but your should also generate your own test cases).
First, assemble each of these files with lc3as. Then
disassemble them with your lc3dis. Save the output of
lc3dis in a file via redirection.
lc3as t1.asm
./lc3dis t1.obj >newt1.asm
Now in order to confirm that your code is correct use the Unix diff
utility to compare t1.asm and newt1.asm.
diff -w -i t1.asm newt1.asm
If diff produces no output, the two files are the same. Note
that -w instructs diff to ignore whitespace and
-i instructs it to ignore case. If the files are different,
diff will indicate how they are different (type "man
diff" for more information on diff).
We will be using this testing method for our automatic testing scripts,
so make sure diff produces no output.
Submission
Please submit your code in a file called lc3dis.c in the usual
way. As in previous assignments, turnin will only work on
eniac-s.seas.upenn.edu.
ssh eniac-s.seas.upenn.edu
If prompted, enter your eniac password. Then...
cd cse240hw8
turnin -c cse240 -p HW8 lc3dis.c
exit
Due Date
Note that this assignment is due the Wednesday before Thanksgiving
break. Given that this assignment only requires an addition 70 lines of
code, we could have made it due on Monday. But we decided to give you a
little flexibility. We suspect many of you will want to turn it in on
Monday or Tuesday, so you are not working on it right before break. As
always, early submissions are fine!