Chip&Core Architecture

Slides:



Advertisements
Similar presentations
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Microprocessor Microarchitecture Multithreading Lynn Choi School of Electrical Engineering.
Adam Kunk Anil John Pete Bohman.  Released by IBM in 2010 (~ February)  Successor of the POWER6  Implements IBM PowerPC architecture v2.06  Clock.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
Instruction Level Parallelism (ILP) Colin Stevens.
SYNAR Systems Networking and Architecture Group CMPT 886: Architecture of Niagara I Processor Dr. Alexandra Fedorova School of Computing Science SFU.
Chapter Hardwired vs Microprogrammed Control Multithreading
Chapter 17 Parallel Processing.
1 Lecture 26: Case Studies Topics: processor case studies, Flash memory Final exam stats:  Highest 83, median 67  70+: 16 students, 60-69: 20 students.
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Computer performance.
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
A Flexible Multi-Core Platform For Multi-Standard Video Applications Soo-Ik Chae Center for SoC Design Technology Seoul National University MPSoC 2009.
Multi-Core Architectures
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
11 Workshop on Information Technology March Shanghaï CONFIDENTIAL Architectures & Digital IC design.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Niagara: a 32-Way Multithreaded SPARC Processor
Kevin Eady Ben Plunkett Prateeksha Satyamoorthy.
COMP25212 CPU Multi Threading Learning Outcomes: to be able to: –Describe the motivation for multithread support in CPU hardware –To distinguish the benefits.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
Adam Kunk Anil John Pete Bohman.  Released by IBM in 2010 (~ February)  Successor of the Power6  Clock Rate: 2.4 GHz GHz  Feature size: 45.
CSC 7080 Graduate Computer Architecture Lec 8 – Multiprocessors & Thread- Level Parallelism (3) – Sun T1 Dr. Khalaf Notes adapted from: David Patterson.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
My Coordinates Office EM G.27 contact time:
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
The University of Adelaide, School of Computer Science
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
COMP 740: Computer Architecture and Implementation
Presented by: Nick Kirchem Feb 13, 2004
Lynn Choi School of Electrical Engineering
Electrical and Computer Engineering
Microarchitecture.
Lynn Choi School of Electrical Engineering
Simultaneous Multithreading
The University of Adelaide, School of Computer Science
Structural Simulation Toolkit / Gem5 Integration
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
CS775: Computer Architecture
Multi-Processing in High Performance Computer Architecture:
The University of Adelaide, School of Computer Science
Hardware Multithreading
Interconnect with Cache Coherency Manager
Coe818 Advanced Computer Architecture
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Chip&Core Architecture Chip Architecture 4 multi-thread processor cores 4 MB L2 cache, multi-bank None-blocking crossbar switch between cores and L2 cache, scalable from 1 to 4 cores Directory based L2 cache coherency       Thread scheduler FB-DIMM memory controller (possibly)       Reconfigurable crypto-coprocessor supporting coding and decoding symmetric algorithms (DES, AES, RCx, GDES, TDES, CRYPT,etc)       Core Architecture MIPS ISA; 4 threads, Coarse Multithreading, only one thread at a time; 16 KB L1 instruction cache, 8 KB (or 16KB)  L1 data cache; 5-8 stage pipeline floating point unit

Why Multi-thread core To reduce the effect of memory access bottleneck: When a thread must wait for memory, just switch to another thread, hide the memory latency problem. With a lessened effective penalty for memory misses we can make branch prediction less aggressive, which means easier development and a smaller, simpler core.     We also planned to have a hardware thread scheduler, which can balance overall workload by dispatching threads to appropriate cores. This part will be coupled with commercial OS, as LINUX. Each MIPS processor will be able to be connected in an efficient manner to the Reconfigurable crypto-coprocessor. This will be done through specific network.

Multicore architecture Control Computing TLP et RPU Reconfigurable Interconnections Complexity management Embedded reliability Energy management Advanced technologies Reconfigurable coprocessors OS Scheduling, … Switch and interconnection RISC core RISC core L2 L2 L1 L1 RISC core RISC core L1 L1 RISC core RISC core L1 L1 RPU fg core RPU cg core L1 L1

1er issue : Parallelism management L1 cache design : High performance cache Full custom cache design Memory management Memory hierarchy Multibank memory System design Considering, at the same time, both sw and hw aspects Embedded thread controller QoS functionnalities Shared ressources Multi-bank L2 cache Thread Controller MIPS core I$/D$ M2

2nd issue : Interconnections High performance Interconnections networks Reconfigurable & Dynamic configuration Asynchronous Advanced design I/O management HW and embedded SW design Switch and interconnection I/O Thread Controller MIPS core I$/D$ Multi-bank L2 cache M2

3rd issue : Reconfigurable cryptoprocessor High performance cryptoprocessor For high data rate Symetric cryptography Multi-mode cryptography On the fly reconfiguration Highly interconnected to processor, memory and I/O Switch and interconnection Thread Controller I/O MIPS core I$/D$ R. Crypto Coprocessor. Multi-bank L2 cache M2

Parallelism management Thread Controller Switch and interconnection OS R. Crypto Coprocessor. MIPS core I$/D$ Multi-bank L2 cache I/O 3 R&D areas Parallelism management Interconnections Reconfigurable cryptoprocessor