Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Memory Access
Scheduling
Onur Mutlu and Thomas Moscibroda
Computer Architecture Group
Microsoft Research
Multi-Core Systems
unfairness
CORE 0
CORE 1
CORE 2
CORE 3
L2
CACHE
L2
CACHE
L2
CACHE
L2
CACHE
Multi-Core
Chip
Shared DRAM
Memory System
. . . DRAM
Bank 7
Rows
Row address 0
1
Columns
Row 01
Row
Empty
1
9
Column address 0
Column decoder
Data
3
DRAM Controllers
[Rixner, ISCA00]
Outline
The Problem
Experimental Evaluation
Conclusions
The Problem
Streaming threads
Threads that keep on accessing the same row
T0: Row 0
T0:
T1: Row 05
Row decoder
The Problem
T1:
T0:Row
Row111
0
T1:
T0:Row
Row16
0
Request Buffer
Row
Row 00
Row Buffer
Column decoder
Row size: 8KB, cache block
size: 64B
of T0 serviced before
T1
Data
7
Consequences of Unfairness in
DRAM
7.74
DRAM is the only shared resource
4.72
1.85
1.05
Security07]
Outline
The Problem
Experimental Evaluation
Conclusions
Row-buffer locality
Bank parallelism
Memory-slowdown = Tshared/Talone
11
Outline
The Problem
Experimental Evaluation
Conclusions
12
Tracks Tshared
Estimates Talone
If unfairness <
If unfairness
Row
16
Row
00
Row111
Row Buffer
T1 Slowdown 1.14
1.03
1.06
1.08
1.11
1.00
Unfairness
1.06
1.04
1.03
1.00
1.05
Data
15
Outline
The Problem
Experimental Evaluation
Conclusions
16
Implementation
Tracking Tshared
Relatively easy
Estimating Talone
Estimating Tinterference(1)
Estimate the row that would have been in the row buffer if
the thread were running alone
Estimate the extra bank access latency the request incurs
Tinterference(C) +=
Estimating Tinterference(2)
19
Hardware Cost
Outline
The Problem
Experimental Evaluation
Conclusions
21
Outline
The Problem
Experimental Evaluation
Conclusions
23
Evaluation Methodology
Benchmarks
SPEC CPU2006 and some Windows Desktop applications
256, 32, 3 benchmark combinations for 4-, 8-, 16-core
experiments
24
Baseline FR-FCFS
FR-FCFS+Cap
FCFS
Static cap on how many younger row-hits can bypass older accesses
Unfairly penalizes non-intensive threads
1.27X
1.81X
1.26X
27
System Performance
5.8%
4.1%
4.6%
28
9.5%
11.2%
29
Outline
The Problem
Experimental Evaluation
Conclusions
30
Conclusions
Stall-Time Fair
Memory Access
Scheduling
Onur Mutlu and Thomas Moscibroda
Computer Architecture Group
Microsoft Research
Backup
35
37
9.1%
7.8%
38
A Case Study
7.28
2.07
2.08
1.87
1.27
Memory Slowdown
Unfairness:
40
41
42
Effect of
43
44