CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Dark Silicon and End of Multicore.

Slides:



Advertisements
Similar presentations
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Advertisements

Lecture 6: Multicore Systems
Computer Abstractions and Technology
Power Reduction Techniques For Microprocessor Systems
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
1 Pipelining for Multi- Core Architectures. 2 Multi-Core Technology Single Core Dual CoreMulti-Core + Cache + Cache Core 4 or more cores.
 States that the number of transistors on a microprocessor will double every two years.  Current technology is approaching physical limitations. The.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Power Management in Multicores Minshu Zhao. Outline Introduction Review of Power management technique Power management in Multicore ◦ Identify Multicores.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Energy Model for Multiprocess Applications Texas Tech University.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison,
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Computer System Architectures Computer System Software
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
Scaling and Packing on a Chip Multiprocessor Vincent W. Freeh Tyler K. Bletsch Freeman L. Rawson, III Austin Research Laboratory.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
Alternative Computing Technologies
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
Multi Core Processor Submitted by: Lizolen Pradhan
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of.
StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache Hyunjin Lee Sangyeun Cho Bruce R. Childers Dept. of Computer Science University.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Amdahl’s Law in the Multicore Era Mark D.Hill & Michael R.Marty 2008 ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun Ham 2012.
Feb. 19, 2008 Multicore Processor Technology and Managing Contention for Shared Resource Cong Zhao Yixing Li.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
CSE 691: Energy-Efficient Computing Lecture 7 SMARTS: custom-made systems Anshul Gandhi 1307, CS building
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Message Passing Computing 1 iCSC2015,Helvi Hartmann, FIAS Message Passing Computing Lecture 1 High Performance Computing Helvi Hartmann FIAS Inverted CERN.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Importance of Single-core in Multicore.
The End of Conventional Microprocessors Edwin Olson 9/21/2000.
Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications.
University of Washington What is parallel processing? Spring 2014 Wrap-up When can we execute things in parallel? Parallelism: Use extra resources to solve.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? Wasim Shaikh Date: 10/29/2015.
0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 3 Computer Evolution.
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
CS203 – Advanced Computer Architecture
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Integration Lower sums Upper sums
CS203 – Advanced Computer Architecture
Lynn Choi School of Electrical Engineering
Anshul Gandhi 347, CS building
TECHNOLOGY TRENDS.
Technology advancement in computer architecture
Architecture & Organization 1
The University of Adelaide, School of Computer Science
Multi-Processing in High Performance Computer Architecture:
Architecture & Organization 1
Parallel Processing Sharing the load.
CS/EE 6810: Computer Architecture
Chapter 1 Introduction.
Semiconductor Industry:
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 20 Parallel Programming CSE /27/2019.
Utsunomiya University
Presentation transcript:

CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Dark Silicon and End of Multicore Scaling Hadi Esmaeilzadeh, Emily Blem, Renée St. Amant, Karthikeyan Sankaralingam, Doug Burger

CISC 879 : Advanced Parallel Programming Outline Introduction Motivation Models: 1. Device Scaling Model 2. Core Scaling Model 3. Multicore Scaling Model Model Combinations Summary Conclusion

CISC 879 : Advanced Parallel Programming Introduction Moore’s Law: Doubling of transistors every 18 months. Dennard scaling: As transistors get smaller, their power density stays constant. Moore’s law coupled with Dennard scaling has resulted in commensurate exponential performance increase

CISC 879 : Advanced Parallel Programming Introduction In recent years, Dennard scaling appears to be broken. Due to failure of Dennard scaling, (slowed supply voltage scaling) core count scaling may be in jeopardy. Researchers have projected that, at 8 nm technology nodes, the amount of Dark Silicon may reach up to 50%-80%

CISC 879 : Advanced Parallel Programming Motivation In 2024, will the processors have 32 times the performance of processors from 2008?

CISC 879 : Advanced Parallel Programming Models 3 models and their combinations are discussed: 1.Device Scaling Model 2.Single Core Scaling Model 3.Multi-core Scaling model

CISC 879 : Advanced Parallel Programming Models

CISC 879 : Advanced Parallel Programming Device Scaling Model Uses ITRS 2010 technology roadmap and conservative scaling (Borkar’s predictions) for device scaling. This provides the area, power and frequency scaling factors at future technology nodes (45nm to 8nm).

CISC 879 : Advanced Parallel Programming Device Scaling Model

CISC 879 : Advanced Parallel Programming Core Scaling Model The core-level model provides the maximum performance that a single core can sustain for any given area. Pareto-optimal frontiers for single-core area/performance and power/performance are created using a large set of processors.

CISC 879 : Advanced Parallel Programming Core Scaling Model Power/Performance frontier, 45nm Area/Performance frontier, 45nm

CISC 879 : Advanced Parallel Programming Multi-Core Scaling Model Two mainstream classes of multicore organizations, multi-core CPUs and many-thread GPUs, which represent two extreme points in the threads-per- core spectrum are modeled. To determine area, power and performance of any application for “any” chip topology for CPU like and GPU-like multicore performance.

CISC 879 : Advanced Parallel Programming Multi-Core Scaling Model Two models are presented: 1. Amdahl’s Law Upper Bounds 2. Realistic Performance Model

CISC 879 : Advanced Parallel Programming Amdahl’s law Amdahl’s law is used to find the theoretical maximum speedup using multiple processors. The law is extended to describe symmetric, asymmetric, dynamic and composed multicore topologies. The model gives the Upper Bound of parallel performance.

CISC 879 : Advanced Parallel Programming Amdahl’s law Multicore topologies: 1.Symmetric Multicore: Multiple copies of one core operating at the same voltage and frequency setting. 2. Asymmetric Multicore: One large monolithic core and many identical small cores.

CISC 879 : Advanced Parallel Programming Amdahl’s law Multicore topologies: 3. Dynamic Multicore: During parallel code portions, the large core is shut down and vice-versa. 4.Composed Multicore: A collection of small cores that can logically fuse together to compose a high-performance large core.

CISC 879 : Advanced Parallel Programming Amdahl’s law

CISC 879 : Advanced Parallel Programming Realistic Model The model discussed, does not consider the Microarchitectural features and workload behavior. This model formulates the performance of a multicore in terms of chip organization, frequency, CPI, cache hierarchy and memory bandwidth. It also includes application behavior, degree of thread level parallelism.

CISC 879 : Advanced Parallel Programming Realistic Model Model Validation

CISC 879 : Advanced Parallel Programming Model Combination 1.Device Scaling x Core Scaling Based on ITRS roadmap predictions, scaling the microarchitecture core from 45nm to 8nm will result in 3.9x performance improvement and an 88% reduction in power consumption. Based on conservative scaling, the performance will improve only 44% and 74% reduction in power consumption.

CISC 879 : Advanced Parallel Programming Model Combination 2.Device Scaling x Core Scaling x Multicore Scaling

CISC 879 : Advanced Parallel Programming Model Combination 2.Device Scaling x Core Scaling x Multicore Scaling Geometric mean of the speedup is obtained as shown in the table.

CISC 879 : Advanced Parallel Programming Summary As depicted, due to the power and parallelism limitations, a significant gap exists between what is achievable and what is expected by Moore’s Law.

CISC 879 : Advanced Parallel Programming Conclusion Amount of dark silicon increases as we scale down the technology node. Predicted speedup of 32x is not achieved by both ITRS or conservative scaling. Optimistic speedup that can be achieved is 7.9x

CISC 879 : Advanced Parallel Programming Questions?

CISC 879 : Advanced Parallel Programming Thank you