Due: Thursday, October 9 at the beginning of class
Caches
1. Consider a 64-byte cache with 8-byte blocks, associativity of 2 and
LRU replacement policy. For the sequence of references given, identify
the cache
misses and label them as compulsory, capacity or conflict. The addresses
are in octal and the cache is originally invalid. Give your results in
a table
format, like the one shown below. The first three lines have been completed
for you.
Reference addresses: 240, 020, 214, 534, 102, 360, 454, 072, 116, 134, 244, 216, 160, 534, 450, 070, 162, 110, 210, 026.
cache state prior to access
reference address miss? Type?
______________________________________________________________________
(-,-) (-,-)
(-,-) (-,-) 240
compulsory miss
(24,-) (-,-)
(-,-) (-,-) 020
compulsory miss
(24,-) (-,-)
(02,-) (-,-) 214
compulsory miss
2. Problem 5.4 from H+P, parts a) thru e)
3. Problem 5.7 from H+P
4. Virtually-addressed caches often have more tag overhead than their physically-addressed counterparts. Not only are virtual addresses longer than physical addresses, but the caches need process identifiers with each tag to distinguish virtual addresses of multiple processes from one another. Assume a 64-bit machine with 64GB of physical memory (i.e., physical addresses are just big enough to address 64GB). OS process IDs are 1 byte each. Now, assume a 64KB, 4-way set associative cache with 64B blocks. Ignoring storage for valid/dirty bits, what is the total number of tag bits required for this cache assuming it is a) virtually-tagged? and b) physically-tagged?
Virtual Memory
5. Consider a 64-bit machine with 16KB pages and 1GB of physical memory. Assume that each PTE contains a PFN only and that PTEs are integer number of bytes (i.e., even if a PTE is 9 bits, it is stored as 2 bytes).
a. How much storage is needed for a single-level page table?
b. How much storage is needed for the first-level table only of a two-level
virtual page table? (Assume each 2nd level table is 16KB in size)
c. How much storage is needed for the first-level table only of a two-level
physical page table? (Again, assume each 2nd level table is 16KB in size)
Simulation/Programming
6. Use the SimpleScalar simulator sim-cache to simulate the data side of the memory hierarchy for gcc (SPECint2000) and applu (SPECfp2000). Assume the following parameters:
a. What is the data cache miss rate?
b. What is the data TLB miss rate?
c. What is the L2 cache local miss rate?
d. What is the L2 cache global miss rate?
e. Assume that the hit time of the data cache and TLB is 1 cycle, the
hit time of the second level cache is 10 cycles, and the hit time of memory
is 100 cycles. Assume a TLB miss will always hit in the L2 cache.
What is the average latency of a memory access in this system?
g. Design and run experiments to separate the miss rates of the (first
level) data cache into three numbers, one for each of the different kinds
of misses. What are the compulsory, capacity, and conflict miss rates (e.g.,
compulsory miss rate = compulsory misses / total accesses)? Can you explain
any trends?
h. Using parameter changes (or hacking if you wish) implement a technique
that attacks each kind of miss (e.g., higher associativity and victim buffers
attack conflict misses). Repeat your separation of misses experiments
for each technique. What are the reductions in each kind of miss?
Are the techniques doing what they are supposed to be doing?
7. Modify SimpleScalar (files cache.h and cache.c) to implement sub-blocking (i.e., the ability of a cache to deal with blocks that are "partially there"). In your implementation you may assume that the maximum number of sub-blocks in a block is 4.
a. Rerun your overall miss-rate (not separated into compulsory,
conflict, and capacity) experiments with a data cache that has 2-way
and 4-way sub-blocking. What are the new miss rates? What are
the new average access times?
b. Sub-blocking is technique that reduces miss penalty. Assume
that the 10 cycle L2 miss latency is actually comprised of 6 cycles to
access the L2 and 1 cycle to transfer each 8B chunk on the bus. What are
the new average access times given this assumption?