Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
com
Confidential information of WhizChip Design Technologies (www.whizchip.com). Contains WhizChip or its customers proprietary and business sensitive data.
Contents
Introduction Terms Related to Cache Cache States Project Specifications Need For Search Algorithm Proposed Algorithm 1:BST Proposed Algorithm 2:Splay Tree Class Architecture Conclusion and Scope for Future Work
References
Introduction
Master Stalls
Flag Bits Cache Miss Cache Hierarchy Victim Cache
Cache States
Valid, Invalid: When valid cache line is present in the cache. When invalid, cache line is not present in the cache. Unique, Shared: When unique, the cache line exists only in one cache. When shared, cache line exists in more than one cache. Clean, Dirty: When Clean, the cache line is not changed so no need to update the main memory, when this clean cache line is replaced. When dirty, the cache line is changed, need to update main memory when this cache line is replaced.
Valid Unique Shared Invalid
Dirty
UD Unique Dirty
SD Shared Dirty
I Invalid
Clean
UC Unique Clean
SC Shared Clean
Project Specifications
Main Memory Controller This controller accepts address on the shared bus. This controller also has a delay to mimic the real time latency that
is seen in a typical main memory access.
On a cache miss it needs to fetch the data from the main memory
or get it from other snooped cache.
Project Specifications
L2 Cache Controller When a dirty cache line from any of the L1 cache is to be replaced then it
is moved to L2 cache.
Saves the clock cycles needed to write it back to the main memory.
Search Algorithm Two Search Algorithms have been implemented in this project. Cache Simulation Models Four Simulation Models were developed. I have done the
evaluation of the models using test cases.
Model 1
M1
M2
L1 Cache
L1 Cache
M3
M4 M4
L1 Cache
L1 Cache
Model 2
M1 M2
M3
M4
L1 Cache
L1 Cache
Model 3
M1
M2
L1 Cache
L1 Cache
M3
M4
L1 Cache
L1 Cache
Model 4
M1
M2
M3
M4
The replacement algorithm for the cache is not a standard policy. The read and write channels for the main memory are separate. No particular snooping protocol is used.
Basic requirement is a search for the requested address in the cache. There are many algorithms that have been devised for efficient search. Along with search we need to add and delete addresses . We also need to update and replace the cache lines with new lines.
The algorithm should add and delete the address in a manner which does not affect the search drastically.
way.
We need a suitable data structure which can store the address in an effective
The memory footprint of the data structure should also be optimum so that it
does not consume too much memory resources.
Background Study
Hash Coding
Hash coding is a process in which a search key is transformed, through the use of a hash function to an actual address for the associated data. A very simple hash function is the modulues function. Pseudo CAMs
Since fully associative memories are difficult and expensive to build, relative to normal main memory, a method of building a large random access memory but with associative access would be advantageous.
The pseudo CAM uses a multiple memory bank architecture in which a key is hashed to an address which is valid with every bank. Pre-Computation Technique Here extra information is stored along with tag. This extra information is derived from the stored bit. For an input tag we first compute the number of ones / zeros and compare with the stored bit
The right subtree of a node should have values greater than the current
node's values.
Both the left and right subtrees should also be binary search tree The number of elements in a data structure depends on number of levels of
the binary search tree. As the number of levels increase the number of elements increase. If we have 'n' levels in a binary search tree then we have 2n -1 elements.
Search Operation
Root node=addr?
Yes
Search Successful
No
No
Is addr=left node?
Yes
Search Successful
Is addr=right node ?
No
Add Operation
Is Root empty ?
Yes
No
No
Yes
Yes
Yes
Yes
Delete Operation
Yes
Replace root node with in node successor Delete the duplicate enteries
Yes
Yes
Update Operation
Yes
No
Yes
No
Yes
Replace Operation
Yes
No
Yes
No The last added node is replaced with the new cache line
A splay tree is self-balancing binary search tree. A balanced binary search tree has uniform height on both sub trees . Along with self-balancing property, the splay tree has the additional property
that whenever a new address is added, it is bought to the root node.
This process of bringing the address that is added to the root is called
splaying.
So in a splay tree the time required to access most recently used address is
very less as they will be nearer to the root.
Splaying
This step is done when p is the root. The tree is rotated on the edge between x and p.
Zig-Zig Step. This step is done when p is not the root and x and p are either both right children or are both left children. The tree is rotated on the edge joining p with its parent g, then rotated on the edge joining x with p.
Zig-Zag Step.
This step is done when p is not the root and x is a right child and p is a left child or vice versa. The tree is rotated on the edge between x and p, then rotated on the edge between x and its new parent g.
Splay
P X
Splay
G X P D P A X D G
A B C
Search Operation
This is the most important operation in a binary search tree. The given address is compared with the root, its left child and right child. The next stage of comparisons has four possibilities. If the address to be searched is less than p and the left child then the
address if exists will be in the node represented by A.
If the address to be searched is less than p and greater than the left child
then the address if exists will be in the node represented by B.
If the address to be searched is greater than p and less than the right child
then the address if exists will be in the node represented by C.
If the address to be searched is greater than p and the right child then the
address if exists will be in the node represented by D
Search Operation
LC
RC
Add Operation
Current = root node No Change current node to right child
Yes
Yes
No No
Yes
Yes
No
Yes
Yes
No
Yes
Yes
No End
No
Delete Operation
Yes
Replace root node with in node successor Delete the duplicate enteries
Yes
Cache Line
Field Key Description Stores the Address. The address is matched in search operation. operation. It is unique part of the cache line which distinguishes distinguishes from other cache lines. Stores the Data that is associated with particular address. This This may be consistent with the main memory or may be provided by the master. Flag bit .This bit when set indicates that the cache line is shared shared among other masters Flag bit. This bit when set indicates that the data in the cache cache line is dirty. When the cache line is evicted data must be must be written back to the main memory. Flag bit. This bit when set indicates that the data in the cache cache line is valid.
Data
Shared
Dirty
Invalid
The add task is used to add the cache line to the data structure. The delete operation for the binary search tree is also implemented. Temporary nodes are used to find the in node successor or the in -node
predecessor
The update operation is common for the both the algorithms. The splay task is implemented only for the second algorithm. In the implementation we splay the data structure when we add 3, 5, 7, 9
and 15 elements
16
16
24
24
16
24
24
32
32
24
40
40
24
24
32
16
40
16
40
32
Algorithm Class
Binary Search Tree The algorithm class takes the data structure object as a parameter. This means that though the data structure changes we need not change the
algorithm.
Splay Tree In the splay tree implementation the whole data structure is divided into
eight binary trees.
The hash function will select the bank, where the addresses are stored. Pipelined scheme saves cycles compared with algorithm that does not
have pipeline as the add and delete will sit idle. Here all the three work in parallel hence saving clock cycles.
Another important component in the model is the Main Memory Controller Read accepts the address and gives the data with a data valid signal after 20
cycles.
Similarly the write task accepts the address and data to be written in the main
memory.
There is another task which initializes the locations of the main memory for
simulation purpose.
Local Hit Rate - 0.25 Local Hit Rate - 0.5 Local Hit Rate - 0.75 Local Hit Rate - 1 Local Hit Rate - 0.25 Snoop Hit Rate - 0.25
15.15 10.25 5.35 0.65 10.58 5.67 1.09 1.61 0.99 2.24 6.13
Local Hit Rate - 0.5 6.19 Snoop Hit Rate - 0.25 Local Hit Rate - 0.5 2.5 Snoop Hit Rate - 0.5 Local Hit Rate - 0.25 2.64 Snoop Hit Rate - 0.75 Local Hit Rate - 0.75 1.23 Snoop Hit Rate - 0.25 Snoop Hit Rate - 1 4.31 Local Hit Rate - 0.25 6.02 Snoop Hit Rate - 0.5
Masters)
Local Hit Rate - 0.25 Local Hit Rate - 0.5 Local Hit Rate - 0.75 Local Hit Rate - 1 15.15 10.25 5.35 0.65 11.13 32 64 96 128 32 0 0 0 0 0 0 0 0 0 32 96 64 32 0 64
Local Hit Rate - 0.25 10.80 Victim Hit Rate - 0.25 Local Hit Rate 0.25 6.67
6.66
32
32
32
32
5.75
64
32
32
Local Hit Rate - 0.25 Snoop Hit Rate - 0.75 Local Hit Rate - 0.75 Snoop Hit Rate - 0.25 Snoop Hit Rate - 1 Local Hit Rate - 0.25 Snoop Hit Rate - 0.5
32 96 0 32 0
96 32 128 64 32
0 0 0 0 32
0 0 0 32 64
Snoop Hit Rate 0.25 11.40 Victim Hit Rate - 0.25 Victim Hit Rate 0.25 Local Hit Rate - 0.25 15.70 3.08
16.01 1.63
0 32
0 96
32 0
96 0
Description
Local Hit - 0.25 Local Hit - 0.5 Local Hit - 0.75 Local Hit -1
32 64 96 128
96 64 32 0
Description
Local Hit - 0.25 Local Hit - 0.5 Local Hit - 0.75 Local Hit -1
20 20 20 20
96 64 32 0
20 19 18 17 16 15 14
Clock cycles
1..128 16
24
40
48
64
Input address
We can also use different replacement policies for the cache controller. The
cache architecture itself can be of different types like direct mapped or set associative.
References
1)Hennessy, John and David Patterson, Computer Architecture: A Quantitative Approach. 2) FAST:Fast Architecture Sensitive Tree Search on Modern CPUs and GPUs. Changkyu Kim, Jatin Chhugani, Nadathur Satish, Eric Sedlar, Anthony D SIGMOD10. 3) Designing Very Large Content- Addressable Memories by John H Shaffer,University of Pennsylvania
Thank You