1 JILP Workshop on Architecture Competitions JWAC-3 Memory Scheduling Championship (MSC) 9 th June 2012 Organizers: Rajeev Balasubramonian, Utah Niladrish.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

All Programmable FPGAs, SoCs, and 3D ICs
1 SIMGRID_Scheduler A Simulated Scheduling System for GRID Environment Marcello CASTELLANO, Giacomo PISCITELLI and Nicola SIMEONE Department of Electrical.
Slide 1 Insert your own content. Slide 2 Insert your own content.
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
UNITED NATIONS Shipment Details Report – January 2006.
and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
Chapter 3: Top-Down Design with Functions Problem Solving & Program Design in C Sixth Edition By Jeri R. Hanly & Elliot B. Koffman.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Multiplication Facts Review. 6 x 4 = 24 5 x 5 = 25.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
0 - 0.
2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt Time Money AdditionSubtraction.
ALGEBRAIC EXPRESSIONS
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING Think Distributive property backwards Work down, Show all steps ax + ay = a(x + y)
Addition Facts
CS4026 Formal Models of Computation Running Haskell Programs – power.
Welcome to Who Wants to be a Millionaire
C1 Sequences and series. Write down the first 4 terms of the sequence u n+1 =u n +6, u 1 =6 6, 12, 18, 24.
Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman Ebrahimi Yale N. Patt.
Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.
1 Lecture 2: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM Video 1: Using AM.
SE-292 High Performance Computing
Mehdi Naghavi Spring 1386 Operating Systems Mehdi Naghavi Spring 1386.
Improving DRAM Performance by Parallelizing Refreshes with Accesses
ABC Technology Project
1 Overview Assignment 4: hints Memory management Assignment 3: solution.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.
Bypass and Insertion Algorithms for Exclusive Last-level Caches
O X Click on Number next to person for a question.
© S Haughton more than 3?
Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications Adwait Jog 1, Evgeny Bolotin 2, Zvika Guz 2,a, Mike Parker.
25 July, 2014 Martijn v/d Horst, TU/e Computer Science, System Architecture and Networking 1 Martijn v/d Horst
1 Directed Depth First Search Adjacency Lists A: F G B: A H C: A D D: C F E: C D G F: E: G: : H: B: I: H: F A B C G D E H I.
Linking Verb? Action Verb or. Question 1 Define the term: action verb.
Energy & Green Urbanism Markku Lappalainen Aalto University.
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
11 = This is the fact family. You say: 8+3=11 and 3+8=11
Week 1.
We will resume in: 25 Minutes.
1 Ke – Kitchen Elements Newport Ave. – Lot 13 Bethesda, MD.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
1 Unit 1 Kinematics Chapter 1 Day
O X Click on Number next to person for a question.
PSSA Preparation.
Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani.
3-Software Design Basics in Embedded Systems
Chapter 3 General-Purpose Processors: Software
A Case for Refresh Pausing in DRAM Memory Systems
The Compact Memory Scheduling Maximizing Row Buffer Locality Young-Suk Moon, Yongkee Kwon, Hong-Sik Kim, Dong-gun Kim, Hyungdong Hayden Lee, and Kunwoo.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.
Staged-Reads : Mitigating the Impact of DRAM Writes on DRAM Reads
Manjunath Shevgoor, Rajeev Balasubramonian, University of Utah
Presentation transcript:

1 JILP Workshop on Architecture Competitions JWAC-3 Memory Scheduling Championship (MSC) 9 th June 2012 Organizers: Rajeev Balasubramonian, Utah Niladrish Chatterjee, Utah Zeshan Chishti, Intel

2 Introduction to the USIMM Simulation Infrastructure N. Chatterjee, R. Balasubramonian, M. Shevgoor, S. Pugsley, A. Udipi, A. Shafiee, K. Sudan, M. Awasthi, Z. Chishti

3 USIMM Goals Portable Trace-based Simple processor model and detailed memory model Plug-in scheduler algorithm

4 USIMM Overview INPUT TRACE FILE

5 USIMM Overview 25 R 0x81a5aae8 0x2eb6c137 3 W 0x81a4ab00 0 R 0x81a5ab28 0x2eb6c137 INPUT TRACE FILE Instruction PCCache line address

6 USIMM Overview 25 R 0x81a5aae8 0x2eb6c137 3 W 0x81a4ab00 0 R 0x81a5ab28 0x2eb6c137 INPUT TRACE FILE ROB

7 USIMM Overview 25 R 0x81a5aae8 0x2eb6c137 3 W 0x81a4ab00 0 R 0x81a5ab28 0x2eb6c137 INPUT TRACE FILE ROB MC READ Q MC WRITE Q

8 USIMM Overview 25 R 0x81a5aae8 0x2eb6c137 3 W 0x81a4ab00 0 R 0x81a5ab28 0x2eb6c137 INPUT TRACE FILE ROB MC READ Q MC WRITE Q

9 USIMM Overview 25 R 0x81a5aae8 0x2eb6c137 3 W 0x81a4ab00 0 R 0x81a5ab28 0x2eb6c137 INPUT TRACE FILE ROB MC READ Q MC WRITE Q PRECHG PWR-UP/DN REFRESH

10 USIMM Overview 25 R 0x81a5aae8 0x2eb6c137 3 W 0x81a4ab00 0 R 0x81a5ab28 0x2eb6c137 INPUT TRACE FILE ROB MC READ Q MC WRITE Q PRECHG PWR-UP/DN REFRESH Scheduler.c

11 USIMM Overview 25 R 0x81a5aae8 0x2eb6c137 3 W 0x81a4ab00 0 R 0x81a5ab28 0x2eb6c137 INPUT TRACE FILE ROB MC READ Q MC WRITE Q PRECHG PWR-UP/DN REFRESH Scheduler.c

12 USIMM Overview 25 R 0x81a5aae8 0x2eb6c137 3 W 0x81a4ab00 0 R 0x81a5ab28 0x2eb6c137 INPUT TRACE FILE ROB MC READ Q MC WRITE Q PRECHG PWR-UP/DN REFRESH Scheduler.c DRAIN

13 USIMM Overview 25 R 0x81a5aae8 0x2eb6c137 3 W 0x81a4ab00 0 R 0x81a5ab28 0x2eb6c137 INPUT TRACE FILE ROB MC READ Q MC WRITE Q PRECHG PWR-UP/DN REFRESH Scheduler.c FORCED REFRESH

14 Workloads 10 programs, 10 workloads, 18 experiments (1 & 4 channels) comm2 comm1 comm1 comm1 comm2 comm2 MT0-canneal MT1-canneal MT2-canneal MT3-canneal fluid swapt comm2 comm2 face face ferret ferret black black freq freq stream stream fluid fluid swapt swapt comm2 comm2 ferret ferret fluid fluid swapt swapt comm2 comm2 ferret ferret black black freq freq comm1 comm1 stream stream 8 programs, 8 workloads, 14 experiments (1 & 4 channels) tigr libq libq libq mummer mummer leslie leslie MT0-fluid MT1-fluid MT2-fluid MT3-fluid comm4 comm4 comm5 comm5 comm3 comm3 comm3 comm3 libq libq libq mummer mummer mummer tigr tigr Each trace runs on its own core, with a private 512KB LLC.

15 Workloads 10 programs, 10 workloads, 18 experiments (1 & 4 channels) comm2 comm1 comm1 comm1 comm2 comm2 MT0-canneal MT1-canneal MT2-canneal MT3-canneal fluid swapt comm2 comm2 face face ferret ferret black black freq freq stream stream fluid fluid swapt swapt comm2 comm2 ferret ferret fluid fluid swapt swapt comm2 comm2 ferret ferret black black freq freq comm1 comm1 stream stream 8 programs, 8 workloads, 14 experiments (1 & 4 channels) tigr libq libq libq mummer mummer leslie leslie MT0-fluid MT1-fluid MT2-fluid MT3-fluid comm4 comm4 comm5 comm5 comm3 comm3 comm3 comm3 libq libq libq mummer mummer mummer tigr tigr 5 billion instr traces million instr traces (Simpoint) Each trace runs on its own core, with a private 512KB LLC. Commercial trans-processing, PARSEC, SPEC2k6, Biobench

16 Configurations Two main system configs: 1 and 4 channels Each uses a different address mapping policy, retire width, ROB size, write queue size More traces larger address space (4GB/trace) larger DRAM chips (and corresponding power model)

17 Metrics and Tracks Storage must not exceed 68KB, implementable logic Performance Track: sum of execution times of all programs (87) in all 32 workloads EDP Track: delay is time for last program to finish, energy uses a detailed memory power model (Micron power calculator) and a simple system power model (constant power, plus core power with clock gating, plus memory power) (memory contributes 15-35% of system power) PFP Track: perf is sum of all execution times, fairness is the average of max slowdown in each workload