S CALING T HE S PEEDUP OF MULTI - CORE CHIPS BASED ON A MDAHL S LAW A.V. Bogdanov, Kyaw Zaya DUBNA, 2012 1.

Slides:



Advertisements
Similar presentations
Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.
Advertisements

FIGURE 15.1 Single-Threaded Application: Maps to One Core.
0 - 0.
Addition Facts
Multiple Processor Systems
The first-generation Cell Broadband Engine (BE) processor is a multi-core chip comprised of a 64-bit Power Architecture processor core and eight synergistic.
Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC KnightShift: Enhancing Energy Efficiency by.
Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Addition 1’s to 20.
Parallelism Lecture notes from MKP and S. Yalamanchili.
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (
Distributed Systems CS
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
Computer System Architectures Computer System Software
Lecture 2 : Introduction to Multicore Computing
Performance Evaluation of Parallel Processing. Why Performance?
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
Amdahl’s Law in the Multicore Era Mark D.Hill & Michael R.Marty 2008 ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun Ham 2012.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson CS8625-June Class Will Start Momentarily… Homework.
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? Wasim Shaikh Date: 10/29/2015.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Advanced Computer Networks Lecture 1 - Parallelization 1.
CSIS Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Balance Point The basis for the argument against “putting all your (speedup)
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Understanding Parallel Computers Parallel Processing EE 613.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Parallel IO for Cluster Computing Tran, Van Hoai.
Background Computer System Architectures Computer System Software.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
CS203 – Advanced Computer Architecture Performance Evaluation.
Classification of parallel computers Limitations of parallel processing.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
CS203 – Advanced Computer Architecture
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Computer Architecture: Parallel Processing Basics
Parallel Processing - introduction
Introduction to Parallelism.
Memory Opportunity in Multicore Era
Parallel I/O System for Massively Parallel Processors
Compiler Back End Panel
Compiler Back End Panel
CLUSTER COMPUTING.
CSE8380 Parallel and Distributed Processing Presentation
Distributed Systems CS
Chapter 4 Multiprocessors
Potential for parallel computers/parallel programming
The University of Adelaide, School of Computer Science
Presentation transcript:

S CALING T HE S PEEDUP OF MULTI - CORE CHIPS BASED ON A MDAHL S LAW A.V. Bogdanov, Kyaw Zaya DUBNA,

S PEEDUP IN PARALLELISM Architecture of the machines Operation Systems Memory and processor resources Number of processor Problem size The use of algorithms 2

Constant problem size scaling Independent on number of processors Using Amdahl's Law as an argument against massively parallel processing is not valid The serial percentage is not practically obtainable 3

-. Where, N - number of processors α - The proportion of successive calculations β, γ - parameters characterizing of the network 4

T YPES OF MULTI - CORE CHIPS 1.Symmetric multi-core chip 2.Asymmetric multi-core chip 3.Dynamic multi-core chip 5

S YMMETRIC MULTI - CORE CHIP Where, ( f)= software fraction that is parallelizable n = the total number of BCE ( base core equivalents) r= number of BCE on a kernel Perf(r)= performance of r number cores Processor Memory I/O 6

A SYMMETRIC MULTI - CORE CHIP Common Memory I/O Private memory Processor Private memory 7

D YNAMIC MULTI - CORE CHIP Sequential mode Parallel mode 8

Advantages of multi-core chips Better throughput : it improves the performance of computer systems by allowing parallel processing of segments of programs. Better reliability: It provides a built –in backup. If one of the CPUs breaks down, the other CPUs automatically takes over the complete workload until repairs are made. Hence, multiprocessor systems have better reliability. Better utilization of resource : In additions to the CPUs, it also facilitates more efficient utilization of all the other devices of the computer system. Disadvantages of multi-core chips A large main memory is required. A very sophisticated operating system is required to schedule, balance and co-ordinate the input, output and processing activities of multiple CPUs. 9

S PECIFICATIONS OF T-P LATFORMS C LUSTER T-Platforms Cluster T-EDGE96, HPC – CPU2x Intel E 5335 (2.0 GHz) CommunicatorInfiniband 20 Gb/s Disk Memory(per node) 160 Gb GPU- RAM(per Node)16 Gb Total Ram Amount786Gb Total48 nodes, 384 cores Peak Efficiency3,07 Tflops 10

Figure. Speedup Testing on T-Platforms Cluster 11

B ENCHMARKING THE T-P LATFORMS C LUSTER The test result of NPB LU Class C 12

C ONCLUSION 13 New Technologies and Architectures Clustering problems Modifications on new types of processor

T HANK Y OU 14