联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2021-10-29 10:40

University of Central Florida

Department of Computer Science

CDA 5106: Fall 2021

Machine Problem 1: Cache Design, Memory Hierarchy Design

1. Ground Rules

1. All students must work alone.

2. Sharing of code between students is considered cheating and will receive appropriate action

in accordance with university policy. The TAs will scan source code through various tools

available to us for detecting cheating. Source code that is flagged by these tools will be dealt

with severely: 0 on the project and referral to the Office of Student Conduct for sanctions.

3. You must do all your work in the C/C++ or Java languages. Exceptions must be pre-approved

by the instructor.

2. Machine Problem Description

In this machine problem, you will implement a flexible cache and memory hierarchy simulator

and use it to compare the performance, area, and energy of different memory hierarchy

configurations, using a subset of the SPEC-2000 benchmark suite.

3. Specification of Memory Hierarchy

Design a generic cache module that can be used at any level in a memory hierarchy. For

example, this cache module can be “instantiated” as an L1 cache, an L2 cache, an L3 cache,

and so on. Since it can be used at any level of the memory hierarchy, it will be referred to

generically as CACHE throughout this specification.

3.1. Configurable parameters

CACHE should be configurable in terms of supporting any cache size, associativity, and block

size, specified at the beginning of simulation:

o SIZE: Total bytes of data storage. o ASSOC: The associativity of the cache (ASSOC

= 1 is a direct-mapped cache).

o BLOCKSIZE: The number of bytes in a block.

There are a few constraints on the above parameters: 1) BLOCKSIZE is a power of two and 2)

the number of sets is a power of two. Note that ASSOC (and, therefore, SIZE) need not be a

power of two. As you know, the number of sets is determined by the following equation:

2


# =

×

3.2. Replacement policy

All students need to implement three replacement policies: LRU (least-recently-used),

PseudoLRU policy and optimal policy. Replacement policy will be a configurable parameter for

the CACHE simulator.

3.2.1 LRU policy

Replace the block that was least recently touched (updated on hits and misses).

3.2.2 Pseudo-LRU policy (Tree-PLRU)

Replace the block that was least recently touched by using a binary tree, for example, a 4-way

set associative cache needs three bits to keep track of most recently used block. Following is an

example with diagrams.


3.2.3 Optimal policy

Replace the block that will be needed farthest in the future. Note that this is the most difficult

replacement policy and it is impossible to implement in a real system. This will need



3

preprocessing the trace to determine reuse distance for each memory reference (i.e. how many

accesses later we will need this cache block). You can then run the actual cache simulation on

the output of the preprocessing stage.

Note: If there is more than one block (in a set) that’s not going to be reused again in the trace,

replace the leftmost one that comes up from the search.

3.3. Write Policy

CACHE should use the WBWA (write-back + write-allocate) write policy.

o Write-allocate: A write that misses in CACHE will cause a block to be allocated in

CACHE. Therefore, both write misses and read misses cause blocks to be allocated in

CACHE.

o Write-back: A write updates the corresponding block in CACHE, making the block dirty. It

does not update the next level in the memory hierarchy (next level of cache or memory).

If a dirty block is evicted from CACHE, a writeback (i.e., a write of the entire block) will be

sent to the next level in the memory hierarchy.

3.4. Allocating a block: Sending requests to next level in the memory hierarchy

Your simulator must be capable of modeling one or more instances of CACHE to form an overall

memory hierarchy, as shown in Fig. 1.

CACHE receives a read or write request from whatever is above it in the memory hierarchy

(either the CPU or another cache). The only situation where CACHE must interact with the next

level below it (either another CACHE or main memory) is when the read or write request misses

in CACHE. When the read or write request misses in CACHE, CACHE must “allocate” the

requested block so that the read or write can be performed.

Thus, let us think in terms of allocating a requested block X in CACHE. The allocation of

requested block X is actually a two-step process. The two steps must be performed in the

following order.

1. Make space for the requested block X. If there is at least one invalid block in the set,

then there is already space for the requested block X and no further action is required

(go to step 2). On the other hand, if all blocks in the set are valid, then a victim block

V must be singled out for eviction, according to the replacement policy (Section 3.2).

If this victim block V is dirty, then a write of the victim block V must be issued to the

next level of the memory hierarchy.

2. Bring in the requested block X. Issue a read of the requested block X to the next level

of the memory hierarchy and put the requested block X in the appropriate place in

the set (as per step 1).

To summarize, when allocating a block, CACHE issues a write request (only if there is a victim

block and it is dirty) followed by a read request, both to the next level of the memory hierarchy.

4

Note that each of these two requests could themselves miss in the next level of the memory

hierarchy (if the next level is another CACHE), causing a cascade of requests in subsequent

levels. Fortunately, you only need to correctly implement the two steps for an allocation locally

within CACHE. If an allocation is correctly implemented locally (steps 1 and 2, above), the

memory hierarchy as a whole will automatically handle cascaded requests globally.

From

CPU

Read or

Write

Request


Read or Write Request


Read or Write Request


Read or Write Request

Main Memory

Fig. 1: Your simulator must be capable of modeling one or more instances of CACHE to form an overall

memory hierarchy.

CACHE

CACHE

5

3.5. Updating state

After servicing a read or write request, whether the corresponding block was in the cache already

(hit) or had just been allocated (miss), remember to update other state. This state includes

LRU/Pseudo-LRU/optimal counters affiliated with the set as well as the valid and dirty bits

affiliated with the requested block.

3.6. Inclusion Property

Now, implement another inclusion property (inclusive property) for CACHE. Inclusion property

will be a configurable parameter for the CACHE simulator.

3.6.1 Non-inclusive cache

Non-inclusive property is the default property used in this machine problem. It is simply what

you’ll get if you follow the directions listed above. There is no enforcement of either the cache

inclusion nor the cache exclusion property. A cache block in an inner cache may or may not be

in an outer cache.

3.6.2 Inclusive cache

According to the inclusive property, an outer cache should be a superset of all inner caches it

surrounds. i.e. any reference in L1 cache must also hit in the L2 cache. For homogeneous

caches, such as the ones we shall be testing, the only difference between inclusive and non-

inclusive cache is on L2 eviction (happens when read or write request misses at the L2 cache

and the requested block needs to be allocated). When a victim block in the L2 cache needs to

be evicted, the L2 cache must invalidate the corresponding block in L1 as well (assuming it exists

there). If the L1 block that needs to be invalidated is dirty, a write of the block will be issued to

the main memory directly.

4. Memory Hierarchies to be Explored in this Machine Problem

While Fig. 1 illustrates an arbitrary memory hierarchy, you will only need to study the memory

hierarchy configurations shown in Fig. 2a and Fig. 2b. Also, these are the only configurations

the TAs will test.

For this machine problem, all CACHEs in the memory hierarchy will have the same BLOCKSIZE.

from CPU

Read or Write Request


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp