Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

Slides:



Advertisements
Similar presentations
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Advertisements

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Computer Science & Engineering
1. Microprocessor. mp mp vs. CPU Intel family of mp General purpose mp Single chip mp Bit slice mp.
Instructor: Sazid Zaman Khan Lecturer, Department of Computer Science and Engineering, IIUC.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Discovering Computers 2010
INTEL COREI3 INTEL COREI5 INTEL COREI7 Maryam Zeb Roll#52 GFCW Peshawar.
6/30/2015HY220: Ιάκωβος Μαυροειδής1 Moore’s Law Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips.
Dr. Gheith Abandah, Chair Computer Engineering Department The University of Jordan 20/4/20091.
Complete CompTIA A+ Guide to PCs, 6e Chapter 2: On the Motherboard © 2014 Pearson IT Certification
Understanding Computers: Today and Tomorrow, 13th Edition 1 The Motherboard Computer chip: Circuit board: Motherboard or system board: –All devices must.
Computer performance.
Mr C Johnston ICT Teacher BTEC IT Unit 02 - Lesson 02 Inside Computers #1 – Motherboards, CPUs, PSUs and Cooling.
Information and Communication Technology Fundamentals Credits Hours: 2+1 Instructor: Ayesha Bint Saleem.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
 Design model for a computer  Named after John von Neuman  Instructions that tell the computer what to do are stored in memory  Stored program Memory.
Introduction CSE 410, Spring 2008 Computer Systems
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Architecture Examples And Hierarchy Samuel Njoroge.
Multi-core architectures. Single-core computer Single-core CPU chip.
Complete CompTIA A+ Guide to PCs, 6e Chapter 2: On the Motherboard © 2014 Pearson IT Certification
Introduction Computer Organization and Architecture: Lesson 1.
Multi-Core Architectures
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Chapter 2 Hardware. Learning Objectives Upon successful completion of this chapter, you will be able to: describe information systems hardware; identify.
I T Essentials I Chapter 1 JEOPARDY HardwareConnector/CablesMemoryAcronymsPotpourri
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
PC Internal Components Lesson 4.  Intel is perhaps the most recognizable microprocessor manufacturer. List some others.
Succeeding with Technology Chapter 2 Hardware Designed to Meet the Need The Digital Revolution Integrated Circuits and Processing Storage Input, Output,
System Bus.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 3 Computer Evolution.
G043 – Lecture 03 Motherboards and Processors Mr C Johnston ICT Teacher
Chapter 2.
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
CS203 – Advanced Computer Architecture
Lecture # 10 Processors Microcomputer Processors.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Hardware Architecture
Modern Processors.  Desktop processors  Notebook processors  Server and workstation processors  Embedded and communications processors  Internet.
CPU Central Processing Unit
William Stallings Computer Organization and Architecture 6th Edition
Conclusions on CS3014 David Gregg Department of Computer Science
COSC3330 Computer Architecture
Computer Components.
CS203 – Advanced Computer Architecture
Graphics Processor Graphics Processing Unit
Lynn Choi School of Electrical Engineering
Multiprocessing.
ECE 154A Introduction to Computer Architecture
Architecture & Organization 1
Phnom Penh International University (PPIU)
Multi-Processing in High Performance Computer Architecture:
Multicultural Social Community Development Institute ( MSCDI)
Dr. Javier Navaridas COMP25212 System Architecture
Architecture & Organization 1
Multi-core systems COMP25212 System Architecture
CS/EE 6810: Computer Architecture
Constructing a system with multiple computers or processors
Chapter 1 Introduction.
ICT Programming Lesson 2:
Computer Evolution and Performance
CS 3410, Spring 2014 Computer Science Cornell University
Chapter 4 Multiprocessors
Lecture 20 Parallel Programming CSE /27/2019.
Presentation transcript:

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group

Multi-Cores are Coming (here?)  Many processors in normal desktops/laptops are ‘dual core’ or ‘quad core’ What does this mean? Why is it happening? How are they different? Where are they going? Do they change anything?

Moore’s Law 45nm Fun Facts  A human hair= 90000nm  Bacteria = 2000nm  Silicon atom = 0.24nm

The need for Multi-Core  For over 30 years the performance of processors has doubled every 2 years  Driven mainly by shrinkage of circuits  Smaller circuits more transistors per chip shorter connections lower capacitance  Smaller circuits go faster  In early 2000s the rate started to decrease

Motivation

Is cooling a problem? Intel Nehalem: In the event of all the cores not being used, the unused cores can be shutdown allowing the remaining cores to use the spare resources and speed up.

The Memory Wall  Processor utilization (15%-25%) Memory Speed is failing to keep up with processor speed. Why?

The End of “Good Times”  Slowdown for several reasons Power density increasing (more watts per unit area) - cooling is a serious problem Small transistors have less predictable characteristics Architectural innovation hitting design complexity problems (limited ILP) Memory does not get faster at the same rate as processors

A solution is replication  Put multiple CPUs (cores) on a single integrated circuit (chip)  Use them in parallel to achieve higher performance  Simpler to design than a more complex single processor  Need more computing power – just add more cores?

How to Connect Them?  Could have independent processor/store pairs with interconnection network  At the software level the majority of opinion is that shared memory is the right answer for a general purpose processor  But, when we consider more than a few cores, shared memory becomes more difficult to implement

Can We Use Multiple Cores?  Small numbers of cores can be used for separate tasks – e.g. run a virus checker on one core and Word on another  If we want increased performance on a single application we need to move to parallel programming  General purpose parallel programming is known to be hard – consensus is that new approaches are needed

There Are Problems  We don’t know how to engineer extensible memory systems  We don’t know how to write general purpose parallel programs  If we develop new approaches to parallel programming do they fit with existing serial processor designs?

Intel Core i7 (Nehalem) 2 Simultaneous Multi-Threading per core

Front Side Bus Traditional Structure – "Historical View” (Processor, Front Side Bus, North Bridge, South Bridge) Main Memory (DRAM) Processor and Cache (single die/chip SRAM) North Bridge Chip Memory Controller Graphics Card Motherboard South Bridge Chip I/O Buses (PCIe, USB, Ethernet, SATA HD) …

QPI or HT Typical Multi-core Structure Main Memory (DRAM) Input/Output Hub Graphics Card Motherboard Input/Output Controller I/O Buses (PCIe, USB, Ethernet, SATA HD) … PCIe On Chip core L1 Inst L1 Data Memory Controller core L1 Inst L1 Data L2 Cache L3 Shared Cache

Simplified Multi-Core Structure core L1 Inst Data core L1 Inst Data core L1 Inst Data core L1 Inst Data Level 2 Cache Main Memory On Chip Shared Bus

Nehalem Caches  Private L1: split D$ & I$, 32KB each, 4-way I$ & 8-way set associative, approx. LRU, block size 64 bytes, write-back & write-allocate  Private L2: 8-way set associative, idem.  Shared L3: 16-way set associative, idem

Cache Coherence?

Summary  Multi-core systems are here to stay Physical limitations Design costs  The industry did not want to come but there is no current alternative  One of the biggest changes for our field General Purpose Parallel Programming must be made tractable  For further reading Patterson and Hennessy 4 th Edition Chapter 1