Sei sulla pagina 1di 33

A SEMINAR REPORT

ON

DYNAMIC CACHE MANAGEMENT TECHNIQUE

BY

KANAKA SUSHWANTH KUMAR


15831A0576

PRESENTED TO
THE DEPARTMENT OF COMPUTER ENGINEERING
FACULTY OF ENGINEERING

ENUGU STATE UNIVERSITY OF SCIENCE AND


TECHNOLOGY (ESUT), ENUGU
SUBMITTED

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR


THE AWARD OF A BACHELOR OF ENGINEERING (B.ENG)
DEGREE IN COMPUTER ENGINEERING

SEPTEMBER, 2012
CERIFICATION

I, ELEKWA JOHN OKORIE with the registration number,


ESUT/2007/88499 in the Department of Computer
Engineering in Enugu State University of Science and
Technology, Enugu certify that this seminar was done by me.

--------------------------------- -------------------------
Signature Date

ii
APPROVAL PAGE

This is to certify that this seminar topic on dynamic cache


management technique was approved and carried out by
ELEKWA JOHN OKORIE with Reg. no: ESUT/2007/88499
strict supervision.

--------------------------------- ------------------------------
ENGR. ARIYO IBIYEMI MR.IKECHUKWU ONAH
(Seminar supervisor) (Head of department)

-------------------- ----------------------
DATE DATE

iii
DEDICATION

The report is dedicated to God Almighty for his love, favors


and protection all these time. To my parents Mr. Ugonna Alex
Okorie and Mrs. Monica Elekwa and also to those who
contributed in marking sure that I’m moving ahead from my
wonderful lectures to my great friends. To my family for their
love, care, prayers and support.

iv
ACKNOWLEDGEMENT

Apart from the efforts of me, the success of any seminar


depends largely on the encouragement and guidelines of many
others. I take this opportunity to express my gratitude to the
people who have been instrumental in the successful
completion of this seminar.

I would like to show my greatest appreciation to Dr. Mac vain


Ezedo and his wife. I can’t say thank you enough for his
tremendous support and help.God bless you all.

The guidance and support received from all my friends: Uche


Ugwoda, Peter Obaro, Matin Ozioko ,Oge Raphael and Stone
etc. I am grateful for their constant support and help.

Finally, my lecturers, whose tutelage I was taught under. A


grateful God Bless you. More especially my supervisor ENGR.
ARIYO IBIYEMI and all others and also my amiable HOD.
ENGR. IKECHUKWU ONAH.

v
TABLE OF CONTENTS

Title page i
Certification ii
Approval Page iii
Dedication iv
Acknowledgement v
Abstract vi
Table of Contents vii

CHAPTER ONE
1.0 INTRODUCTION 1
1.1 Power Trends for Current Microprocessors 5

CHAPTER THREE
2.0 Working with the L0 CACHE 6
2.1 pipeline Micro Architecture 7
2.2 Branch Prediction & Confidence Estimation – A
Brief Overview 7

CHAPTER THREE 12
3.0 What is Dynamic Cache Management Technique 12
3.1 Basic Idea of the Dynamic Management Scheme 13
3.2 Dynamic Techniques for l0-cache Management 14
3.3 Simple Method 14
3.4 Static Method 15
3.5 Dynamic Confidence Estimation Method 16
3.6 Restrictive Dynamic Confidence Estimation Method 16

vi
3.7 Dynamic Distance Estimation Method 17
18
CHAPTER FOUR
Comparison of Dynamic Techniques
19
CHAPTER SEVEN
4.0 Conclusion
REFERENCES

vii
LIST OF FIGURES

Fig 1: Memory Hierarchy 3

Fig 2: An Instruction Cache 4

Fig 3: Levels of Cache 6

Fig 4: Pipeline Micro Architecture (A) 7

Fig 5: Pipeline Micro Architecture (B) 7

viii
CHAPTER ONE

1.0 INTRODUCTION

First of all what is Cache Memory:-

 Cache memory is a fast memory that is use to hold the


most recently accessed data.
 Cache is pronounced like the word cash. Cache Memory
is the level of computer memory hierarchy situated
between the processor and main memory. It is a very fast
memory the processor can access much more quickly
than main memory or RAM. Cache is relatively small and
expensive. Its function is to keep a copy of the data and
code (instructions) currently used by the CPU. By using
cache memory, waiting states are significantly reduced
and the work of the processor becomes more effective.

As processor performance continues to grow, and high


performance, wide-issue processors exploit the available
Instruction-Level Parallelism, the memory hierarchy should
continuously supply instructions and data to the data path to
keep the execution rate as high as possible. Very often, the
memory hierarchy access latencies dominate the execution
time of the program. The very high utilization of the

1
instruction memory hierarchy entails high energy demands for
the on-chip I-Cache subsystem.

Chapter 2
EXISTING SYSTEM (should include Literature Survey and
Disadvantages of Existing System)

In order to reduce the effective energy dissipation per


instruction access, we propose the addition of a small, extra
cache (the L0-Cache) which serves as the primary cache of the
processor, and is used to store the most frequently executed
portions of the co de, and subsequently provide them to the
pipeline. Our approach seeks to manage the L0-Cache in a
manner that is sensitive to the frequency of accesses of the
instructions executed. It can exploit the temporalities of the
code and can make decisions on they, i.e., while the code
executes. In recent years, power dissipation has become
one of the major design concerns for the microprocessor
industry. The shrinking device size and the large number of
devices packed in a chip die coupled with large operating
frequencies; have led to unacceptably high levels of power
dissipation. The problem of wasted power caused by
unnecessary activities in various parts of the CPU during code

2
execution has traditionally been ignored in code optimization
and architectural design.

Higher frequencies and large transistor counts more than


offset the lower voltages and the smaller the devices and they
result in large power consumption in a newest version in a
processor family.

Figure 1

Cache is much faster than main memory because it is


implemented using SRAM (Static Random Access Memory).
The problem with DRAM, which comprises main memory, is
that it is composed entirely of capacitors, which have to be
constantly refreshed in order to preserve the stored
information (leakage current). Whenever data is read from the

3
cell, the cell is refreshed. The DRAM cells need to be refreshed
very frequently, i.e. typically every 4 to 16ms. this slows down
the entire process. SRAM on the other hand consists of flip-
flops, which stay in its state as long as the power supply is on.
(A flip-flop is an electrical circuit composed of transistors and
resistors. See picture) Because of this SRAM need not be
refreshed and is over 10 times faster than DRAM. Flip-flops,
however, are implemented using complex circuitry which
makes SRAM much larger and more expensive, limiting its
use.

 Level one cache memory (called L1 Cache, for Level 1


Cache) is directly integrated into the processor. It is
subdivided into two parts:

The first part is the instruction cache, which contains


instructions from the RAM that have been decoded as they
came across the pipelines.

The second part is the data cache, which contains data from
the RAM and data recently used during processor operations.

4
Figure 2 - An instruction cache

 Level 1 cache can be accessed very rapidly. Access


waiting time approaches that of internal processor
registers Level two cache memory (called L2 Cache, for
Level 2 Cache) is located in the case along with the
processor (in the chip). The level two cache is an

5
intermediary between the processor, with its internal
cache, and the RAM. It can be accessed more rapidly
than the RAM, but less rapidly than the level one cache.
Level three cache memory (called L3 Cache, for Level 3
Cache) is located on the motherboard.

1.1 POWER TRENDS FOR CURRENT MICROPROCESSORS

DEC DEC 21164 Pentium Pentium


21164 High Freq Pro II
Freq (Mhz) 433 600 200 300

Power (W) 32.5 45 28.1 41.4

Very often the memory hierarchy access latencies dominate


the execution time of the program; the very high utilization of
the instruction memory hierarchy entails high energy
demands on the on chip I-cache subsystem. In order to reduce
the effective energy dissipation per instruction access, the
addition of an extra cache is proposed, which serves as
the primary cache of the processor, and is used to store
the most frequently executed portion of the code.

6
CHAPTER TWO

2.0 WORKING WITH THE L0 CACHE

Some dynamic techniques are used to manage the L0-cache.


The problem that the dynamic techniques seek to solve is how
to select the basic blocks to be stored in the L0-cache while
the program is being executed. If a block is selected, the

Figure 3

CPU will access the L0-cache first; otherwise, it will go


directly to the I-cache and bypass the L0-cache. In the
case of an L0-cache miss, the CPU is directed to fetch
instructions from the I-cache and to transfer the instructions
from the I-cache to the L0-cache

7
2.1 PIPELINE MICRO ARCHITECTURE

Figure 4

8
Figure 5

Figure 4 shows the processor pipeline we model in this


research. The pipeline is typical of embedded processors such
as StrongARM. There are five stages in the pipeline– fetch,
decode, and execute, mem and writeback. There is no external
branch predictor. All branches are predicted “untaken”. There
is two-cycle delay for “taken” branches.

Instructions can be delivered to the pipeline from one of three


sources: line buffer, I-cache and DFC. There are three ways to
determine where to fetch Instructions:

• Serial–sources are accessed one by one in fixed order;

9
• Parallel–all the sources are accessed in parallel;

• Predictive–the access order can be serial with flexible order


or parallel based on prediction. Serial access results in
minimal power because the most power efficient source is
always accessed first. But it also results in the highest
performance degradation because every miss in the first
accessed source will generate a bubble in the pipeline.

On the other hand, parallel access has no performance


degradation. But I-cache is always accessed and there is no
power savings in instruction fetch. Predictive access, if
accurate, can have both the power efficiency of the serial
access and the low performance degradation of the
parallel access. Therefore, it is adopted in our approach. As
shown in Figure 1, a predictor decides which source to
access first based on current fetch address. Another
functionality of the predictor is pipeline gating. Suppose a DFC
hit is predicted for the next fetch at cycle N. The fetch stage is
disabled at cycle N and the decoded instruction is sent from
the DFC to latch 5. Then at cycle N, the decode stage is
disabled and the decoded instruction is sent from latch 5 to
latch 2. If an instruction is fetched from the I-cache, the hit
cache line is also sent to the line buffer. The line buffer can
provide instructions for subsequent fetches to the same line.

10
2.2 BRANCH PREDICTION & CONFIDENCE ESTIMATION –
A BRIEF OVERVIEW

2.2.1 Branch Prediction

Branch prediction is an important technique to increase


parallelism in the CPU, by predicting the outcome of a
conditional branch instruction as soon as it is decoded.
Successful branch prediction mechanisms take advantage of
the non-random nature of branch behavior. Most branches
are either mostly taken in the course of program
execution.

The commonly used branch predictors are:

1. Bimodal branch predictor.

Bimodal branch predictor uses a counter for determining the


prediction. Each time a branch is taken, the counter is
incremented by one, and each time it falls through, it is
decremented by one. Looking onto the value of the counter
does the prediction. If it is less than a threshold value, the
branch is predicted as not taken; otherwise, it is predicted as
taken.

11
Figure 6

2. Global branch predictor

Global branch predictor considers the past behavior of the


current branch as well as the other branches to predict the
behavior of the current branch.

3. Confidence Estimation

The relatively new concept confidence estimation has been


introduced to keep track of the quality of branch
prediction. Confidence estimators are hardware mechanisms
that are accessed in parallel with the branch predictors when
a branch is decoded, and they are modified when the branch
is resolved. They characterize a branch as ‘high confidence’ or
‘low confidence’ depending upon the branch predictor for the

12
particular branch. If the branch predictor predicted a branch
correctly most of the time, the confidence estimator would
designate the prediction as ‘high confidence’ otherwise as ‘low
confidence’.

13
Chapter 3
PROPOSED SYSTEM

3.0 WHAT IS DYNAMIC CACHE MANAGEMENT


TECHNIQUE:-

 The memory hierarchy of high performance.

 Extrapolating and current trend and this portion are


likely to the near future.

 This mechanism provides to you the instruction stream.

 It is an accounting for and fraction of a chip’s transistor.

 It is use to the eliminate the need for high utilization.

 It is a resizing strategy of cache memory.

3.1 BASIC IDEA OF THE DYNAMIC MANAGEMENT


SCHEME

The dynamic scheme for the L0-cache should be able to


select the most frequently executed basic blocks for placement
in the L0-cache. It should also rely on existing mechanisms
without much hardware investment if it to be attractive for
energy reduction.

The branch prediction in conjunction with confidence


estimation is a reliable solution to this problem.

14
Unusual Behavior of the Branches

A branch that was predicted ‘taken’ with ‘high confidence’


will be expected to be taken during program execution. If it
not taken, it will be assumed to be behaving ‘unusually’.

The basic idea is that, if a branch behaves ‘unusually’,


the dynamic scheme disables the L0-cache access for the
subsequent basic blocks. Under this scheme, only basic
blocks that are to be executed frequently tend to make it to the
L0-cache, hence avoiding cache pollution problems in the L0-
cache

3.2 DYNAMIC TECHNIQUES FOR L0-CACHE


MANAGEMENT

The dynamic techniques discussed in the subsequent


portions select the basic blocks to be placed in the L0-cache.
There are five techniques for the management of L0-cache.

1. Simple Method.

2. Static Method.

15
3. Dynamic Confidence Estimation Method.

4. Restrictive Dynamic Confidence Estimation Method.

5. Dynamic Distance Estimation Method.

Different dynamic techniques trade off energy reduction with


performance degradation

3.3 SIMPLE METHOD

The confidence estimation mechanism is not used in simple


method. The branch predictor can be used as a stand-alone
mechanism to provide insight on which portions of the code
are frequently executed and which is not. A mispredicted
branch is assumed to drive the thread of execution to an
infrequently executed part of the program.

The strategy used for selecting the basic blocks is as follows:

If a branch predictor is mispredicted, the machine will access


the I-cache to fetch the instructions. If a branch is predicted
correctly, the machine will access the L0-cache.

In a misprediction, the pipeline will flush and the machine will


start fetching the instructions from the correct address by
accessing the I-cache. The energy dissipation and the
execution time of the original configuration that uses no L0-
cache is taken as unity, and normalize everything with respect
to that.
16
3.4 STATIC METHOD

The selection criteria adopted for the basic blocks is:

If a ‘high confidence’ branch was predicted incorrectly,


the I-cache is accessed for the subsequent basic blocks.

If more than n low confidence branches have been decoded


in a row, the I-cache is accessed.

Therefore the L0-cache will be bypassed when either of the two


conditions is satisfied. In any other case the machine will
access the L0-cache.

The first rule for accessing the I-cache is due to the fact that a
mispredicted ‘high confidence’ branch behaves ‘unusually’
and drives the program to an infrequently executed
portion of the code. The second rule is due to the fact that a
series of ‘low confidence’ branches will also suffer from the
same problem since the probability that they all are predicted
correctly is low.

3.5 DYNAMIC CONFIDENCE ESTIMATION METHOD

Dynamic confidence estimation method is a dynamic version


of the static method. The confidence of The I-cache is accessed
if a high confidence branch is mispredicted.

More than n successive ‘low confidence’ branches are


encountered.
17
The dynamic confidence estimation mechanism is slightly
better in terms of energy reduction than in the simple or
static method. Since the confidence estimator can adapt
dynamically to the temporalities of the code, it is more
accurate in characterizing a branch and, then, regulating the
access of the L0-cache.

3.6 RESTRICTIVE DYNAMIC CONFIDENCE ESTIMATION


METHOD

The methods described in previous sections tend to place a


large number of basic blocks in the L0-cache, thus degrading
performance. Restrictive dynamic scheme is a more selective
scheme in which only the really important basic blocks would
be selected for the L0-cache.

The selection mechanism is slightly modified as:

The L0-cache is accessed only if a ‘high confidence’


branch is predicted correctly. The I-cache is accessed in any
other case.

This method selects some of the most frequently executed


basic blocks, yet it misses some others. It has much lower
performance degradation, at the expense of lower energy
reduction. It is probably preferable in a system where
performance is more important than energy.

3.7 DYNAMIC DISTANCE ESTIMATION METHOD

18
The dynamic distance estimation method is based on the fact
that, a mispredicted branch triggers a series of successive
mispredicted branches. The method works as follows:

All branches after a mispredicted branch are tagged as


‘low confidence’ otherwise as ‘high confidence’. The basic
blocks after a ‘low confidence’ branch are fetched from the L0-
cache. The net effect is that a branch misprediction causes a
series of fetches from the I-cache.

A counter is used to measure the distance of a branch


from the previous mispredicted branch. This scheme is also
very selective in storing instructions from the L0-cache, even
more than the restrictive dynamic estimation method.

19
Chapter 4
COMPARISONS

COMPARISON OF DYNAMIC TECHNIQUES

The energy reduction and delay increase is a function of the


algorithm used for the regulation of the L0-cache access,
the size of the L0-cache, its block size and its
associability. For example, a larger block size causes a larger
hit ratio in the L0-cache.

This results into smaller performance overhead, and bigger


efficiency since the I-cache does not need to be accessed so
often.

On the other hand if the block size increase does not have a
large impact on the hit ratio, the energy dissipation may go
up, since a cache with a larger block size is less energy
efficient than a cache with the same size but smaller block
size.

The static method and dynamic confidence method make the


assumption:

The less frequently executed basic blocks usually follow less


predictable branches that are mispredicted.

20
The simple method and restrictive dynamic confidence
estimation method address the problem from another angle.
They make the assumption:

Most frequently executed basic blocks usually follow high


predictable branches.

The dynamic distance estimation method is the most


successful in reducing the performance overhead, but the
least successful in energy reduction.

This method possesses stricter requirement for a basic block


to be selected for the L0-cache than the original dynamic
confidence estimation method.

Larger block size and associability will have a beneficial effect


on both energy and performance. The hit rate of a small cache
is more sensitive to the variation of the block size and the
associability.

21
Chapter 6
FUTURE SCOPE

When the Web was first emerging onto the scene, it was simple. Individual web pages
were self-contained static blobs of text, with, if you were lucky maybe an image or two.
The HTTP protocol was designed to be "dumb". It knew nothing of the relationship
between an HTML page and the images it contained. There was no need to. Every
request for a URI (web page, image, download, etc.) was a completely separate request.
That kept everything simple, and made it very fault tolerant. A server never sat around
waiting for a browser to tell it "OK, I'm done!"

Much e-ink has been spilled (can you even do that?) already discussing the myriad of
ways in which the web is different today, mostly in the context of either HTML5 or web
applications (or both). Most of it is completely true, although there's plenty of hyperbole
to go around. One area that has not gotten much attention at all, though, is HTTP.

Well, that's not entirely true. HTTP is actually a fairly large spec, with a lot of exciting
moving parts that few people think about because browsers offer no way to use them
from HTML or just implement them very very badly. (Did you know that there is a
PATCH command defined in HTTP? Really.) A good web services implementation (like
we're trying to bake into Drupal 8 as part of the Web Services and Context Core
Initiative </shamelessplug>) should leverage those lesser-known parts, certainly, but the
modern web has more challenges than just using all of a decades-old spec.

Most significantly, HTTP still treats all URIs as separate, only coincidentally-related
resources.

Which brings us to an extremely important challenge of the modern web that is


deceptively simple: Caching.

Caching is broken

22
The web naturally does a lot of caching. When you request a page from a server, rarely is
it pulled directly off of the hard drive at the other end. The file, assuming it is actually a
file (this is important), may get cached by the operating system's file system cache, by a
reverse proxy cache such as Varnish, by a Content Delivery Network, by an intermediary
server somewhere in the middle, and finally by your browser. On a subsequent request,
the layer closest to you with an unexpired cache will respond with its cached version.

In concept that's great, as it means the least amount of work is done to get what you
want. In practice, it doesn't work so well for a variety of reasons.

For one, that model was built on the assumption of a mostly-static web. All URIs are just
physical files sitting on disk that change every once in a while. Of course, we don't live in
that web anymore. Most web "pages" are dynamically generated from a content
management system of some sort.

For another, that totally sucks during development. Who remembers the days of telling
your client "no, really, I did upload a new version of the file. You need to clear your
browser cache. Hold down shift and reload. Er, wait, that's the other browser. Hit F5
twice. No, really fast. Faster." Yeah, it sucked. There's ways to configure the HTTP
headers to not cache files, but that is a pain (how many web developers know how to
mess with Apache .htaccess files?), and you have to remember to turn that off for
production or you totally hose performance. Even now, Drupal appends junk characters
to the end of CSS URLs just to bypass this sort of caching.

Finally, there's the browsers. Their handling of HTTP cache headers (which are
surprisingly complex) has historically not been all that good. What's more, in many cases
the browser will simply bypass its own cache and still check the network for a new
version.

Now, normally, that's OK. The HTTP spec says, and most browsers obey, that when
requesting a resource that a browser already has an older cached copy of it should
include the last updated date of its version of the file in the request, saying in essence "I
want file foo.png, my copy is from October 1st." The server can then respond with either
a 304 Not Modified ("Yep, that's still the right one") or a 200 OK ("Dude, that's so
old, here's the new one"). The 304 response saves resending the file, but doesn't help
with the overhead of the HTTP request itself. That request is not cheap, especially on
high-latency mobile networks, and especially when browsers refuse to have more than 4-
6 requests outstanding.

23
Chapter 7
CONCLUSION

7.0 CONCLUSION

This paper presents the method for dynamic selection of


basic blocks for placement in the L0-cache. It explains the
functionality of the branch prediction and the confidence
estimation mechanisms in high performance processors.

Finally five different dynamic techniques were discussed for


the selection of the basic blocks. These techniques try to
capture the execution profile of the basic blocks by using the
branch statistics that are gathered by the branch predictor.

The experiment evaluation demonstrates the applicability


of the dynamic techniques for the management of the L0-
cache. Different techniques can trade off energy with delay by
regulating the way the L0-cache is accessed.

24
REFERENCES

[1] J. Diguet, S. Wuytack, F. Cattho or, and H. De Man,


Formalized methodology for data reuse exploration in
hierarchical memory mappings," in Proceedings of the
International Symposium of Low Power Electronics and
Design, pp. 30{35, Aug. 1997.

[2] J. Kin, M. Gupta, and W. Mangione-Smith, The Filter


cache: An energy efficient memory structure," in
Proceedings of the International Symposium on Micro
architecture, pp. 184{193, Dec. 1997.

[3] N. Bellas, I. Ha jj, C. Polychronopoulos, and G.


Stamoulis, Architectural and compiler supp ort for
energy reduction in the memory hierarchy of high
performance microprocessors," in Proceedings of the
International Symposium of Low Power Electronics and
De-sign, pp. 70{75, Aug. 1998.

[4] S. Manne, D. Grunwald, and A. Klauser, Pipeline gating:


Speculation control for energy reduction," in Proceedings
of the International Symposium of Computer
Architecture, pp. 132{141, 1998.

[5] Speed Shop User's Guide. Silicon Graphics, Inc., 1996.

[6] S. Wilson and N. Jouppi, An enhanced access and cycle


time model for on-chip caches," tech. rep., DEC WRL
93/5, July 1994.

[7] Nikolaos Bellas, Ibrahim Hajj, and Constantine


Polychronopoulos, Using dynamic cache management
techniques to reduce energy in a high-performance
processor, Department of Electrical & Computer
Engineering and the Coordinated Science Laboratory,
University of Illinois at Urbana-Champaign 1308 West
Main Street, Urbana, IL 61801, USA.

25

Potrebbero piacerti anche