Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

1
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Objectives: Generate and describe sequences. Vocabulary:
UNITED NATIONS Shipment Details Report – January 2006.
RXQ Customer Enrollment Using a Registration Agent (RA) Process Flow Diagram (Move-In) Customer Supplier Customer authorizes Enrollment ( )
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Break Time Remaining 10:00.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
ABC Technology Project
EU market situation for eggs and poultry Management Committee 20 October 2011.
Bright Futures Guidelines Priorities and Screening Tables
Bellwork Do the following problem on a ½ sheet of paper and turn in.
2 |SharePoint Saturday New York City
Green Eggs and Ham.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
VOORBLAD.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Note to the teacher: Was 28. A. to B. you C. said D. on Note to the teacher: Make this slide correct answer be C and sound to be “said”. to said you on.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Indicator 1 – Number of Older Americans Indicator 2 – Racial and Ethnic Composition.
Januar MDMDFSSMDMDFSSS
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Clock will move after 1 minute
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
Energy Generation in Mitochondria and Chlorplasts
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
ECE8833 Polymorphous and Many-Core Computer Architecture Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Lecture 6 Fair Caching Mechanisms.
PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying.
Presentation transcript:

Yuejian Xie, Gabriel H. Loh

Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data

Capacity Management –Considering different cache space need, allocate proper space to each core. –Guo-MICRO07, Kim-PACT04, Srikantaiah-ASPLOS09, Qureshi-MICRO06 (UCP), … Dead Time Management –Evict dead lines (blocks with no reuse) sooner. –Kaxiras-ISCA01, Qureshi-ISCA07, Jaleel-PACT07 (TADIP), … 3

Core1 Core0 Core 0 gets 5 ways Core 1 gets 3 ways 4

MRU LRU Incoming Block 5

MRU LRU 6 Occupies one cache block for a long time with no benefit!

MRU LRU Incoming Block 7

MRU LRU 8 Useless BlockEvicted at next eviction Useful BlockMoved to MRU position

MRU LRU 9 Useless BlockEvicted at next eviction Useful BlockMoved to MRU position

PIPP: Novel scheme for Promotion and Insertion Eviction –When replacing a block in a set, which should be evicted? Insertion –For new blocks, where to insert the new block? Promotion –When there is a hit in the cache, how to adjust the blocks position/priority? 10

Whats PIPP? –Promotion/Insertion Pseudo Partitioning –Achieving both capacity and dead-time management. Eviction –LRU block as the victim Insertion –The cores quota worth of blocks away from LRU Promotion –To MRU by only one. MRU LRU To Evict Promote Hit Insert Position = 3 (Target Allocation) New 11

Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A B B C C Core0s Block Core1s Block Request MRU LRU Core1s quota=3 D D 12

Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A D D B B Core0s Block Core1s Block Request MRU LRU 6 6 Core0s quota=5 13

Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A D D B B Core0s Block Core1s Block Request MRU LRU Core0s quota=

Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A D D Core0s Block Core1s Block Request MRU LRU D D

Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A Core0s Block Core1s Block Request MRU LRU Core1s quota=3 D D 3 3 E E 16

Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A D D Core0s Block Core1s Block Request MRU LRU 3 3 E E

Core0Core1Core2Core3 Quota6442 MRU LRU Insert closer to LRU position 18

19 MRU 0 Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks Core0s Block Core1s Block Request Strict Partition MRU 1 LRU 1 LRU 0 New

20 MRU LRU Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks Core0s Block Core1s Block Request New Pseudo Partition

Directly to MRU (TADIP) Directly to MRU (TADIP) 21 New MRU LRU Promote By One (PIPP) Promote By One (PIPP) MRU LRU New

22 Algorithm Capacity Management Dead-time Management Note LRU Baseline, no explicit management UCPStrict partitioning TADIP Insert at LRU and promote to MRU on hit PIPP Pseudo-partitioning and incremental promotion

Simulation environment –SimpleScalar-Zesto, Out-Of-Order, Intel Core2-like –32KB, 8way DL1 IL1, 4MB 16way LLC, 1.6GHz DDR2 Workloads Classification –UCP2-5 UCP-friendly, 2-core, 5 th workload –DIP4-3 TADIP-friendly, 4-core, 3 th workload 23

TADIP Friendly UCP Friendly PIPP outperforms LRU, 19.0%, UCP 10.6%, TADIP 10.1% PIPP is too cautious here. 24

TADIP Friendly UCP Friendly PIPP outperforms LRU 21.9%, UCP 12.1%, TADIP 17.5% 25

Occupancy Control Insertion Behavior TADIP inserts no-reuse lines at 1.7 while PIPP inserts those at 1.3. (LRU position equals to 0.) Pseudo-Partition Benefit 26

Novel proposal on Insertion and Promotion A single unified mechanism provides both capacity and dead time management Outperforms prior UCP and TADIP In the full paper: –Special version of PIPP for streaming application –Reducing hardware overhead –Sensitivity analysis 27

28

29

30

31

E.g. Target Partition {5,3} – Actual Occupancy {6,2} = 1 32

33

Streaming Application Detection –#Accesses, #Misses, MissRate > threshold Insertion –At a fixed position (independent of quota) –#Streaming Apps blocks away from LRU position Promotion –Promote by 1 with probability p stream –p stream « 1 34

35

36 Promotion Prob for General App Promotion Prob for Streaming App

37

38