CIT 593: Computer Systems I
Homework #8: LC-3 Assembler, Part II
Due December 12, 2011,
In Homework #7, you started to implement an LC-3 assembler by building some of the "utility
functions", such as parsing the LC-3 instructions and linked list functionality to represent the symbol table.
In this assignment, you will finish the assembler by implementing the code to read the assembly language program file and convert each line into machine language.
You may use the parse.o and linkedlist.o files that were written
as solutions to Homework #7, or you may use your own solutions if you prefer (as long as your solution
passes the parse test and linked list test programs that
we supplied). An implementation of symboltable.o is available, too. Note that these will only work on the eniac machines.
Part I (60 points)
Start with this skeleton code, which contains the "main" function for your program, as well as the other functions that you will implement. The argument to main is the name of the file containing the LC-3 program. That filename is passed to "create_symbol_table", which creates the symbol table using your linked list (using your implementation from homework #7).
Now modify "main" so that it reads each line of the text file and uses the "parse" function to break it up into its substrings. Then, determine what the operation is (AND, LD, BR, etc.) and then call the appropriate function that is provided in the skeleton code, passing the corresponding arguments. If the operation is not valid, report an error and end the program (be careful about thinking that labels are operations!).
In addition to main, there are 16 other functions that correspond to the various LC-3 operations (one has already been completed for you!). The purpose of each function is to convert all of the arguments to the corresponding LC-3 machine language code, using the encoding described in the back cover of the textbook. The machine language code should then be displayed in binary (i.e., as a sequence of 0s and 1s) on the screen using printf.
For instance, if the main function reads the instruction "ADD R0 R1 R2", your program would then:
Of course, if one of the operands is a label, you must look it up in the symbol table and calculate the appropriate offset (which means that you need to keep track of which line you're reading!).
- use the parse function to break the instruction into its constituent substrings ("AND", "R0", "R1", "R2")
- determine that the operation is "AND"
- call add("R0", "R1", "R2")
- print the string "0001000001000010" to the screen
There are also three functions for the pseudo-ops (.FILL, .BLKW, and .STRINGZ). Recall that, for .FILL, you only need to produce the 16-bit binary representation of the value. For .BLKW, you have the size of the block, and each one should be initialized to zero. For .STRINGZ, each spot has the ASCII representation of the character, followed by the ASCII representation of null at the end.
All of your functions should handle the following error situations (where applicable):
In any of these cases, the function should just report the error (by printing it to the screen) and return 0, and your program should terminate (it's okay if you've already printed other instructions to the screen before realizing there's an error).
- invalid label: if the label provided as an argument is not found in the symbol table
- invalid operand: if an operand is not legal, for instance an invalid register, an invalid immediate value for ADD/AND/LDR/STR, or an invalid vector number for TRAP
- offset/value out of range: if the calculated offset or immediate value is too large to represent given the number of bits allowed
For simplicity, you may assume that:
- there are no comments in the source code
- all labels and registers are in capital letters, e.g. "DATA" or "R2"
- instructions like BR, JSR, LD, LDI, LEA, ST and STI only take labels, not immediate offsets (e.g. you would never have "LD R0 x45")
- all immediate values are represented in hexadecimal (including .FILL values, .BLKW sizes, immediate operands to ADD/AND/LDR/STR, and TRAP vectors) starting with a lower-case 'x', e.g. "x24"
- there are no commas between operands in an instruction, e.g. you could have "ADD R1 R0 x3" but not "ADD R1, R0, x3"
- all string constants that are specified using the .STRINGZ pseudo-op contain only lower-case letters, upper-case letters, spaces, and digits (no punctuation)
- all other assumptions from Homework #7 Part III still hold
As in the previous assignment, you should not change
the function headers of any of the functions that have been provided to you in the skeleton code; you
can, however, add additional "helper functions" if necessary.
- Work on the easiest operations first. Those are the ones that only rely on registers: JMP, JSRR, and NOT (as well as the register mode versions of ADD and AND).
- .BLKW is also pretty easy: you just have to figure out how many addresses to fill, but you've already done that in homework #7 (presumably).
- Then work on the ones that use immediate values: ADD, AND, LDR, STR, TRAP, and .FILL.
- Next, work on the ones that use the symbol table: JSR, LD, LDI, LEA, ST, and STI.
- Last, do BR (it's a bit tricky because you have to figure out the condition codes) and then .STRINGZ (because of the characters).
- If you find yourself writing the same code over and over again (e.g. converting register numbers to their binary representation) then don't copy-and-paste a whole bunch of times; create a new function and invoke it wherever you need it.
- Also keep in mind that a lot of the operations work exactly the same way, except that the opcode (first four bits) are different. For instance, LD, ST, LDI and STI are all structured the same way; so are LDR and STR; ADD and AND; and even JMP and JSRR.
- The hardest part of this assignment is figuring out how to convert immediate values and offsets to binary, particularly given the restrictions on the number of bits you can use, and the fact that values may be negative. Keep in mind that it is easier to convert a hexadecimal number to binary than a decimal number, so be cautious about how you go about this.
- The other hard part is converting the characters in a .STRINGZ line into binary. Don't you dare write a 63-line if/else statement that prints the binary encoding of each of the printable characters! Create a function that will convert the value (remember, a char is really just an int in disguise) into binary.
- It is very tempting to look online for solutions, particularly to the last two things mentioned here. Resist the urge! You can do this on your own, I'm sure!
- But if you give in to temptation, please be sure to cite whatever you find.
You should create your own test LC-3 programs to check the encoding of the different instructions. Here is a program that you can use for your testing, along with its expected output.
One last thing!! If the LC-3 program assembles correctly, the only thing that should be printed to the screen is the binary encoding, with no spaces between bits and no other debugging output displayed. This will help the TAs grade your assignment. The TAs will be instructed not to grade any programs that fail to adhere to this requirement!
NEW!! Full suite of tests!
These are the tests we will be using to grade your assignment. Please let us know if you have any questions about them:
Submitting the homework
You will submit this homework through Blackboard, as usual.
You should submit lc3assembler.c, along with the versions of parse.o, linkedlist.o, and symboltable.o that you used (any changes you make to those for this assignment will not count towards homework #7). Do not submit any of your own test programs.
Also, you must submit a Makefile that the TAs will use to compile your program. Your Makefile should compile all of the .c files mentioned above into a single executable.
Last, please submit a plain-text readme file that describes any known issues in the programs, such
as tests that the code does not pass. Please zip (or tar) all files together and then submit.