CIS 501 Fall 2002
Homework 2

Due: Thursday, October 9 at the beginning of class

Caches

1. Consider a 64-byte cache with 8-byte blocks, associativity of 2 and LRU replacement policy. For the sequence of references given, identify the cache
misses and label them as compulsory, capacity or conflict. The addresses are in octal and the cache is originally invalid. Give your results in a table
format, like the one shown below. The first three lines have been completed for you.

Reference addresses: 240, 020, 214, 534, 102, 360, 454, 072, 116, 134, 244, 216, 160, 534, 450, 070, 162, 110, 210, 026.

cache state prior to access       reference address    miss? Type?
______________________________________________________________________
(-,-)   (-,-)    (-,-)    (-,-)   240                  compulsory miss
(24,-)  (-,-)    (-,-)    (-,-)   020                  compulsory miss
(24,-)  (-,-)    (02,-)   (-,-)   214                  compulsory miss
 

2. Problem 5.4 from H+P,  parts a) thru e)

3. Problem 5.7 from H+P

4. Virtually-addressed caches often have more tag overhead than their physically-addressed counterparts. Not only are virtual addresses longer than physical addresses, but the caches need process identifiers with each tag to distinguish virtual addresses of multiple processes from one another.  Assume a 64-bit machine with 64GB of physical memory (i.e., physical addresses are just big enough to address 64GB). OS process IDs are 1 byte each.  Now, assume a 64KB, 4-way set associative cache with 64B blocks.   Ignoring storage for valid/dirty bits, what is the total number of tag bits required for this cache assuming it is a) virtually-tagged?  and b) physically-tagged?

Virtual Memory

5. Consider a 64-bit machine with 16KB pages and 1GB of physical memory. Assume that each PTE contains a PFN only and that PTEs are integer number of bytes (i.e., even if a PTE is 9 bits, it is stored as 2 bytes).

a. How much storage is needed for a single-level page table?
b. How much storage is needed for the first-level table only of a two-level virtual page table? (Assume each 2nd level table is 16KB in size)
c. How much storage is needed for the first-level table only of a two-level physical page table? (Again, assume each 2nd level table is 16KB in size)

Simulation/Programming

6. Use the SimpleScalar simulator sim-cache to simulate the data side of the memory hierarchy for gcc (SPECint2000) and applu (SPECfp2000). Assume the following parameters:

Use sampling (see the web page for how) to reduce the running time of the simulations. Use 2% sampling at 10M instruction sample granularity with 4% warmup, i.e., skip 470M instructions, warmup the caches (to remove artificial compulsory misses due to sampling boundaries) for 20M instructions, and sample for 10M instructions, cyclically.  To handle programs shorter than 500M instructions, make your first skip region size 20M instructions (use -insn:sample and -insn:sample:first to do this).

a. What is the data cache miss rate?
b. What is the data TLB miss rate?
c. What is the L2 cache local miss rate?
d. What is the L2 cache global miss rate?
e. Assume that the hit time of the data cache and TLB is 1 cycle, the hit time of the second level cache is 10 cycles, and the hit time of memory is 100 cycles.  Assume a TLB miss will always hit in the L2 cache. What is the average latency of a memory access in this system?

g. Design and run experiments to separate the miss rates of the (first level) data cache into three numbers, one for each of the different kinds of misses. What are the compulsory, capacity, and conflict miss rates (e.g., compulsory miss rate = compulsory misses / total accesses)? Can you explain any trends?
h. Using parameter changes (or hacking if you wish) implement a technique that attacks each kind of miss (e.g., higher associativity and victim buffers attack conflict misses).  Repeat your separation of misses experiments for each technique.  What are the reductions in each kind of miss?  Are the techniques doing what they are supposed to be doing?

7. Modify SimpleScalar (files cache.h and cache.c) to implement sub-blocking (i.e., the ability of a cache to deal with blocks that are "partially there").  In your implementation you may assume that the maximum number of sub-blocks in a block is 4.

a. Rerun your  overall miss-rate (not separated into compulsory, conflict, and capacity) experiments with a data cache that has 2-way and 4-way sub-blocking.  What are the new miss rates?  What are the new average access times?
b.  Sub-blocking is technique that reduces miss penalty. Assume that the 10 cycle L2 miss latency is actually comprised of 6 cycles to access the L2 and 1 cycle to transfer each 8B chunk on the bus. What are the new average access times given this assumption?