CSE 371/372 Spring 2004
Homework 4

Due: Monday, March 29 in the folder outside Cheryl Hickey's office (Levine 502) by 5PM.

Floating Point

1. (5 points) P+H exercise 4.55

2. (10 points) The MIPS instruction cvt.s.w converts a 32-bit signed integer to a 32-bit signed integer to IEEE 754 single-precision floating-point value. Use a paragraph and a diagram to describe the arithmetic circuit that performs this conversion.

Memory Hierarchy

3. (5 points) P+H exercise 7.3

4. (5 points) Consider a 32KB, two-way set associative cache, with 16B blocks. What is the tag overhead of this cache on a 32-bit machine?

5. (15 points) Consider a cache with a hit time of 1 cycle and a lower-level memory hierarchy with an average access time of 10 cycles. The cache miss rate is 10%. Assume an access stream composed of 75% loads and 25% stores and a processor that must wait for stores to fully complete.

What is the average access time of the cache if we assume that it is write-back/write-allocate, that 50% of all blocks are dirty and that there is no write buffer? What is the average access time if we add a write buffer?

What is the average access time if we assume that the cache is write-thru/write-non-allocate with no write buffer? What is the average access time if we add a write buffer?

6. (10 points) Consider a direct mapped 4B cache with 4 1B blocks and a machine with 4 bit addresses. Construct a repeating sequence of 4 accesses that would generate fewer misses if we were to double the block size. Construct a repeating sequence of 4 accesses that would generate more misses if we were to double the block size.

7. (15 points) Consider a cache with 32-byte lines, an 8-byte wide 100MHz bus, and 8-byte wide 10ns access time, 20ns cycle time DRAM chips. What would be the cache miss penalty (in ns) if you used a single DRAM chip? (Ignore the latency of the address bus) What is the smallest number of DRAM chips you could use that would minimize the cache miss penalty?

R372 Simulator Programming

For this assignment, you will again modify the R372 simulator. This time you will add a module that simulates caches. Make a copy of the directory ~amir/cse371/r372/src4/ to a private directory.

This simulator has some new arguments. The optional argument [-i<nsets>:<nways>:<bsize>] simulates an instruction cache with <nsets> sets, an associativity of <nways> and a block size of <bsize> 16-bit words. The optional argument [-d<nsets>:<nways>:<bsize>] simulates a data cache with similar parameters. The optional argument [-tm<tmiss>] sets the simulated miss latency to <tmiss>.

The cache interface is defined in the file cache.h and consists of three functions. The file cache.c provides a partial implementation of this interface, which is your job to complete. cache_create creates a cache structure given the three parameters above. cache_lookup takes a cache and an address. It returns 1 if the access results in a miss and 0 if it results in a hit. It also updates cache contents to reflect the most recent access. cache_stats prints out some statistics. Your job is to modify cache.c to simulate a cache with an LRU replacement policy. Do not worry about distinguishing reads from writes.

HINT: Notice that the function cache_lookup does not contain a parameter for actually getting any data out of the cache. That's because you don't really need to model the data in the cache. There is a useful trick for simulating caches and it is to only simulate the tags, which are needed to compute hits and misses. Data resides in "main memory" only. The interface given assumes you are using this trick, which actually makes modeling a cache quite a bit simpler.

Turn in the cache.c file electronically. Also, use your implementation to answer the following questions using the matrix multiply benchmark in ~amir/cse371/r372/mbench/matrix/ and input #1. To run this program with this input, make a private copy of that directory and use the command <my_simulator> matrix -1 [-other_arguments].

8. (10 points) How many misses does a direct-mapped, 32-word instruction cache with 2-word blocks generate? What is the new number of misses if you halve the number of sets? What is the new number of misses if you double them? What configuration of a cache with a total capacity of 32 words generates the fewest misses?

9. (10 points) How many misses does a direct-mapped, 16-word data cache with 2-word blocks generate? What is the new number of misses if you halve the number of sets? What is the new number of misses if you double them? What configuration of a cache with a total capacity of 16-word generates the fewest misses?

10. (15 points) Recall the 3C miss classification definition. As it turns out, you can measure individual kinds of misses for a given cache configuration. The way you do this is by measuring the misses of two other carefully-chosen configurations and subtracting. Design experiments to measure the compulsory, capacity, and conflict misses in a direct-mapped, 16-word data cache with 2-word blocks.