Homework 9: LC-3 Assembler
CSE240 - Introduction to Computer Architecture
Autumn 2006

Due: Wednesday, Dec 6 at 11:59PM

Now that you have built a disassembler (homework 8), let's build an assembler, just like the as command in the simulator. Let's call our assembler mylc3as. This assignment is composed of two distinct components: (1) writing the assembler code and (2) writing the symbol table used by the assembler. I suggest you complete each part before proceeding to the next. This document describes each of these components separately and then describes the logistics for compiling, testing, debugging, and turning in your assignment.

Part #1: The Assembler

Recall from Chapter 7 that assemblers make multiple passes over the program. Our assembler will make several passes over the program (but one could combine some of these passes): Pass 1 and 2 require a symbol table, which you will build in the second half of this assignment. To allow you to complete part 1 without first completing part 2, we've included a binary implementation of the symbol table (see below for more information).

Pass #0: Parsing Instructions into a List

Parsing is quite complex, so we provide this code (in parser.c and parser.h). Feel free to examine this code if you like. Although we provide the parser, you still need to understand the structure into which instructions are placed. The program is parsed into a linked list where each node of the list is an instance of the structure below.

enum opcodes_t { 
  // LC3 opcodes
  BR, ADD, LD, ST, JSR_JSRR, AND, LDR, STR, 
  RTI, NOT, LDI, STI, JMP_RET, MUL, LEA, TRAP, SUB,
  // assembler directives
  ORIG, FILL, BLKW, STRINGZ,
  // no operation
  UNDEF
};

typedef struct insn_struct insn_t;

struct insn_struct {
  enum opcodes_t opcode;  // the opcode of instruction (see above)
  char* label;            // label on this line (can be NULL)
  int first_reg;          // 1st reg number (-1 implies there isn't one)
  int second_reg;         // 2nd reg number (-1 implies there isn't one)
  int third_reg;          // 3rd reg number (-1 implies there isn't one)
  int offset_immediate;   // immediate or offset value
  char* label_ref;        // label referenced by BR/LD/ST/JSR/LDI/STI/LEA/FILL
  char* string;           // string for STRINGZ directive
  int n;                  // neg bit (only for BR)
  int z;                  // zero bit (only for BR)
  int p;                  // pos bit (only for BR)
  insn_t* next;           // pointer to next instruction in list
};

The fields are:

Passes #1 and #2: Inserting and Looking Up Labels into the Symbol Table

Passes 1 and 2 use a symbol table to store the address that corresponds to all of the labels in the program. Pass 1 iterates over all the instructions in the linked list, looking at the label field and adding label/address pairs to the symbol table. Pass 2 iterates over all the instructions again, but this time it looks up all label_ref fields in the symbol table and uses the address of the current instruction and the address of the label to compute the correct value of offset_immediate. After pass 2, the label_ref field is unused.

To complete the code for passes 1 and 2, you'll need to build the following three functions:

Pass #3: Instruction Generation

After pass 1 and 2, the list of insn_t structures contain all the information necessary to generate encoded LC-3 instructions. Instruction encoding consists of taking the information in the insn_t structure and encoding it in a single 16-bit value. It's just the opposite of what we did in the previous assignment. Like last time, we'll provide some auxiliary routines to make this task more manageable. You'll need to complete the pass_three() function described below. Note that your assembler should print nothing to the display unless an error is encountered.

We needed to add approximately 80 lines of code to mylc3as.c to implement the assembler (excluding the symbol table, below).

Part #2: The Symbol Table

The symbol table allows you to map strings representing labels (for example, "START") to addresses (for example, 0x3000).

Symbol Table Interface

We will use the following interface for our symbol table implementation: We provide a reference implementation of the symbol table (sym_table.o). This allows you to implement the rest of the homework without your own symbol table. It also helps in confirming that your symbol table works properly.

Symbol Table Implementation

Symbol tables can be implemented in many ways (hash tables, linked lists, arrays, etc.), but we will use a binary tree representation. Each node in the tree is represented via the structure below.
typedef struct sym_node_struct sym_node_t;

struct sym_node_struct {
  char* symbol;
  int address;
  sym_node_t* left;
  sym_node_t* right;
};
The symbol field points to an array of characters (string) representing the symbol. The address field is the address of the symbol. The left and right fields are pointers to the left and right sym_node_t child structures in the tree (NULL if child does not exist). The "root" of the tree is pointed to by the global variable g_symboltable_ptr, initially NULL:
  sym_node_t* g_symboltable_ptr = NULL;  // global symbol table
To provide fast lookup of symbols, we want to build the tree such that given any node in the tree, all labels reachable from the left child are alphabetically less than the label of the node ("car" is alphabetically before "dog", but it is after "ant"). Similarly, all labels reachable from the right child are alphabetically greater than the label of the node. The strcmp(char* s1, char* s2) function from the C standard library does just such an alphabetical comparison. The function strcmp(char* s1, char* s2) returns a 0 if the strings are the same, a number greater than 0 if s1 is alphabetically after s2, and a number less than 0 if s1 is alphabetically before s2.

To implement the symbol table, implement the following three functions:

Logistics

Files

This assignment consists of a number of files:

Getting Started

Begin by creating a directory to work in and copying the files we provide. These files are available at ~cse240/html/handouts/hw9/ on eniac or in a zip file. If you are working on eniac, you can get these files as follows.
% cd ~
% mkdir cse240hw9
% cd cse240hw9
% cp ~cse240/html/handouts/hw9/* .

Compiling Your Code

Use gcc on eniac.seas.upenn.edu to compile and run your assembler. Use the following command line:
% gcc -g -Wall mylc3as.c sym_table.c parser.o -o mylc3as
The -g file specifies to add debugging symbols. The -Wall flag turns on all compiler warnings. The -o flag says that the generated program should be called mylc3as. If the -o flag is omitted, the default name is a.out. To avoid retyping this command each time, you can repeat previous shell commands by using the up arrow key to cycle through past commands.

To work on your assembler before implementing your symbol table, you can use our implementation by replacing sym_table.c with sym_table.o:

% gcc -g -Wall mylc3as.c sym_table.o parser.o -o mylc3as

To run your assembler, specify the input assembly file and the output object file. For example, to assemble test.asm into test.obj:

% mylc3as test.asm test.obj

Debugging

You will certainly need to use appropriate tools to debug your code (staring at the source code will only damage your eyesight!). The GDB debugger will be very useful. You can use it to determine exactly what statement caused a segmentation fault. You can also set break points and step through the program's execution.

You will also want to use the Valgrind memory check. Valgrind can help identify all kinds of memory errors such as the use of uninitialized memory, dangling pointers, wild pointers, corrupting the stack, memory leaks, and more. To run your program with Valgrind:

% valgrind mylc3as test.asm test.obj
When errors are encountered, Valgrind prints them to the display. They are mostly self explanatory, but you may want to refer to the following sources of more information on Valgrind:

Testing

In previous assignments, we've given you lots of test cases that covered most of the cases. In this assignment, we're giving you just two basic test cases. Many, many cases are not covered by these two short .asm files. As such, you'll need to create your own .asm test cases, assemble them with PennSim, and verify your code generates the same .obj file. Don't simply run the provided tests and look surprised when your code doesn't work.

You can test your assembler by comparing its output to that of the as command in the simulator. For example, suppose you use as (in PennSim) to assemble t1.asm to t1.obj. Now use mylc3as to assemble t1.asm and compare the two object files:

% mylc3as t1.asm my-t1.obj  (this will produce a file called my-t1.obj)
% cmp t1.obj my-t1.obj
If cmp does not produce any output, the two files are identical, so your assembler produced the same output as the reference assembler. If cmp reports a difference, examine t1.asm to see what kind of instruction(s) it contains. Use this a starting point for debugging. If t1.asm contains 1000s of instructions, you'll have no idea where the problem is. Generate your own test cases that have 1 or 2 instructions. If cmp reports that the two object files are different, you will know that the problem is with those 1 or 2 instructions.

You may also find the hex command useful to look at what hex values are in a binary file.

% hex t1.obj
You may want to use your disassembler from the last assignment to help debug your code (by disassembling the object file created by your assembler). You can also disassemble an object file by loading it into the PennSim simulator and examining the values in memory. For part#2, we've provided a little code to test your symbol table. Naturally, you will want to write more extensive tests. The files test_st1.c and test_st2.c both contain main() functions that will test your symbol table functions. To build, do the following.

% gcc -g -Wall test_st1.c sym_table.c -o test_st1 
% gcc -g -Wall test_st2.c sym_table.c -o test_st2 
We recommend that you run these tests with Valgrind as follows:
% valgrind ./test_st1
% valgrind ./test_st2
The output of each program will indicate whether the test was passed or not. If you fail the tests, examine the .c files to determine the problem. You'll want to create your own tests, too.

Turnin

Please submit your code in files called mylc3as.c and sym_table.c in the usual way:
% cd cse240hw9
% turnin -c cse240 -p hw9 mylc3as.c sym_table.c
% exit