The graph roughly measures the average execution time of a memory access for various access patterns on different sized arrays. Observations: - The machine has a 128K L1 cache. For arrays from 4k to 128k, regardless of stride, each access results in a cache hit. At 256k, data no longer fits in cache. - The L1 miss penalty is slightly over 400 ns. - The L1 cache line size is 256 bytes. It is only when the stride of access increases to 256 that each access results in a L1 cache miss. For small strides, spatial locality reduces the average cost of an access. In these cases, the first access to a cache line results in a miss, but immediately following accesses result in a hit. - The L1 cache is 4-way set associative. Note that, for very large strides, each accessed element maps to the same cache line. (The stride is a power of 2, and the mapping is determined by the lower order address bits.) At the third largest stride, only eight elements in the array are accessed. Yet, for arrays larger that 128k, these eight elements do not fit together into the L1 cache. Clearly, this must be due to cache conflict. When only 4 elements are accessed, all elements fit in cache. - There is no L2 cache. Another level in the memory hierarchy is apparent only for arrays larger than 8M. This is a TLB that maps 8M of memory. (Both the 8M 'cache' size and 'line'/page size below suggest that this cannot be an L2 cache.) - The page size is 16k. For smaller strides, spatial locality reduces the number of TLB misses. - The TLB miss penalty is approximately 250 ns. In general, a TLB miss requires an additional memory access. - There are 512 entrees in the TLB. The TLB maps 8M of memory. This corresponds to 512 pages of 16k each.