Homework 5

Homework Assignment
from
Hennessy & Patterson, Chapter 2, Appendix B

(1) You are building a system around a processor with in-order execution that runs at 1.1GHz and has a CPI of 0.7 excluding memory accesses. The only instructions that read or write data from memory are loads (20% of all instructions executed) and stores (5% or all instructions executed).

The memory system for this computer is composed of a split L1 cache that imposes no penalty on hits. Both the I-cache and the D-cache are direct mapped and hold 32 KB each. The I-cache has a 2% miss rate and 32-byte blocks, and the D-cache is write through with a 5% miss rate and 16 byte blocks. There is a write buffer on the D-cache that eliminates stalls for 95% of all writes.

The 512 KB write-back, unified L2 cache has 64-byte blocks and an access time of 15ns. It is connected to the L1 cache by a 128-bit data bus that runs at 266 MHz and can transfer one 128-bit word per bus cycle. Of all memory references sent to the L2 cache in this system, 80% are satisfied without going to main memory. Also 50% of all blocks replaced are dirty.

The 128-bit wide main memory has an access latency of 60 ns, after which any number of bus words may be transferred at the rate of one per cycle on the 128-bit wide 133 MHz main memory bus.

  1. What is the average memory access time for instruction access?
  2. What is the average memory access time for data reads?
  3. What is the average memory access time for data writes?
  4. What is the overall CPI, including memory accesses?
(2) Given:
* A main memory (Mp), byte-addressable, with a size of 1 Mbyte (220 bytes)
* A cache size of 32 Kbytes
* A block size of 8 bytes
No virtual memory system
compute the portion of the Mp address bits required for each computation for the following three caching schemes. For each scheme the questions are listed most significant group of bits to least.
To get yourself started, figure out how many bits total are needed for an Mp effective address. And remember, 1 MB=220 B; 1 KB=210 B.

1. Direct Mapping
Figure out the number of bits needed to match a cache block frame tag, to specify which cache block (the "block index"), and to specify the displacement within the block.

2. 4-way Set-Associative Organization
Figure out the number of bits needed to match a cache block frame tag, to specify which cache set (the "set index"), and to specify the displacement within the block.

3. Fully Associative Organization
Figure out the number of bits needed to match a cache block frame tag and to specify the displacement within the block.

(3) Given an initially "empty" physical memory and the following VM page reference string:

1 2 3 4 1 2 5 1 2 3 4 5

compute the hit ratio for:
(3a) FIFO with 3 frames of primary memory

(3b) LRU with 3 frames of primary memory

(3c) FIFO with 4 frames of primary memory

(3d) LRU with 4 frames of primary memory

(3e) Point out the unexpected behavior in the above results.

Here is Optima with 3 frames as an example of what I am looking for:

                      Page # referenced
    FRAME   |  #  1  2  3  4  1  2  5  1  2  3  4  5
    --------+----------------------------------------
      1     |  #  1  1  1  1  1  1  1  1  1  3  3  3
      2     |  #  #  2  2  2  2  2  2  2  2  2  4  4
      3     |  #  #  #  3  4  4  4  5  5  5  5  5  5
    
    
    Bold Numbers are blocks that are replacing old data when we have a miss.
    Underlined Numbers are memory hits.

    For 3 frame Optima the hit ratio is 5/12


May 1, 2003