CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.

Slides:



Advertisements
Similar presentations
Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
Advertisements

DSPs Vs General Purpose Microprocessors
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Lecture 6: Multicore Systems
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
Instruction Level Parallelism (ILP) Colin Stevens.
Multiscalar processors
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
1 Thread level parallelism: It’s time now ! André Seznec IRISA/INRIA CAPS team.
1 Thread level parallelism: It’s time now ! André Seznec IRISA/INRIA CAPS team.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Advanced Processor Technology Architectural families of modern computers are CISC RISC Superscalar VLIW Super pipelined Vector processors Symbolic processors.
1 CAPS Compilers Activities IRISA Campus Universitaire de Beaulieu Rennes.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
André Seznec Caps Team IRISA/INRIA 1 High Performance Microprocessors André Seznec IRISA/INRIA
CAPS team Compiler and Architecture for superscalar and embedded processors.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 4: Microarchitecture: Overview and General Trends.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University.
“Processors” issues for LQCD January 2009 André Seznec IRISA/INRIA.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
William Stallings Computer Organization and Architecture 6th Edition
Computer Organization and Architecture Lecture 1 : Introduction
Topics to be covered Instruction Execution Characteristics
CS Lecture 20 The Case for a Single-Chip Multiprocessor
Computer Architecture Principles Dr. Mike Frank
Visit for more Learning Resources
Flow Path Model of Superscalars
Hyperthreading Technology
CS775: Computer Architecture
Superscalar Processors & VLIW Processors
Levels of Parallelism within a Single Processor
Computer Architecture Lecture 4 17th May, 2006
BIC 10503: COMPUTER ARCHITECTURE
IA-64 Microarchitecture --- Itanium Processor
Coe818 Advanced Computer Architecture
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Mattan Erez The University of Texas at Austin
Computer Architecture: A Science of Tradeoffs
EE 4xx: Computer Architecture and Performance Programming
What is Computer Architecture?
The Vector-Thread Architecture
Overview Prof. Eric Rotenberg
Mattan Erez The University of Texas at Austin
What is Computer Architecture?
Levels of Parallelism within a Single Processor
Chapter 12 Pipelining and RISC
ALF Amdhal’s Law is Forever
Research: Past, Present and Future
IA-64 Vincent D. Capaccio.
CSE378 Introduction to Machine Organization
Spring’19 Prof. Eric Rotenberg
Presentation transcript:

CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés

History of CAPS project-team Project-team created in 1994: “Compiler Parallel Architectures and Systems” Common focus: high performance through optimizing the memory hierarchy Comes from supercomputer architecture group: Involved in Marie mini-supercomputer design late participation in ACRI

CAPS: Compiler and Architecture for Superscalar and Special purpose processors Two interacting activities microprocessor architecture (A. Seznec, P. Michaud) High performance Migrating high performance concepts to embedded systems Performance oriented compilation (F. Bodin) Embedded processors + Recently: Worst case execution time analysis (I. Puaut)

CAPS « missions » Defining the tradeoffs between: what should be done through hardware what can be done by the compiler for maximum performance or for minimum cost or for minimum size, power ..

Issues on high performance processor architecture Memory hierarchy management: 1 cycle L1 – 10 cycles L2 – 30 cycles L3 – 200 cycles memory Branch prediction : 30 cycles penalty x N instructions per cycle Single cycle next instruction block address generation ? Complexity quadratic with issue width: Register file, bypass network, issue logic Single chip hardware thread parallelism is available: How do we exploit it ? Power/temperature

Issues on code generation/software environments for embedded processors ILP, caches are entering embedded processor world Code generation must manage them Binary compatibility is not critical, time-to-market is critical Retargetable platforms are wanted: ISAs, architecture Performance is not the only ultimate goal: Code size/ performance Power/ performance System cost/ performance

Recent scientific contributions (1) Processor architecture Global history branch predictors and instruction fetch front-end 2bcgskew used in Compaq EV8 Pipelining the I-fetch front end Limiting hardware complexity on superscalar processors Dataflow prescheduling: instruction window WSRS architecture: register file, bypass network and issue logic Thread parallelism and single chip parallelism: CASH: CMP and SMT hybrid Execution migration: single thread on a multicore, to use all the cache space

Recent scientific contributions (2) architecture/compiler interaction ISA simulation: ABSCISS: ISA and architecture retargetable high speed simulator for VLIW processor IATO: simulation of out-of-order execution IA64 Low power and architecture configurability: Cache reconfiguration at software level on phase basis Hardware/software speculative management of data path and register file width SWARP: retargetable C-to-C preprocessor to enhance multimedia instruction use

Recent scientific contributions (3) compiler and software environments Artificial intelligence in performance tuning CAHT: case based reasoning for assisting performance tuning Automatic derivation of compiler heuristics: using machine learning to derive compiler heuristics Performance code size tradeoffs: Iterative compilation Mixing interpretation on compressed code and native execution

“New-CAPS” objectives (1) High-end microprocessor architecture: From “ultimate performance” to “ maintaining performance to cheaper” Migrating “high-end” concepts to embedded processors: (limited) O-O-O execution Compiler/architecture power management

“New-CAPS” research objectives (2) Embedded systems are more and more complex: performance often comes with unpredictability and unstabibility Dimensioning a system ? Real time constraints ? Research on performance predictability and stability: Predictable/stable performance oriented code generation Predictable/stable performance oriented architecture

“New-CAPS” research objectives (3) On-chip thread parallelism is a new opportunity: Homogeneous: SMT/CMP Tradeoffs, sharing, synchronization Heterogeneous: single ISA Power, performance, multiple ISAs (e.g. SoC) Thread extraction

What can we bring in SCIPARC at architecture level ?

CAPS pipeline background « ancient » background in hardware management of ILP: both research and implementation decoupled pipeline architectures: Involved in the design of Marie mini-supercomputer 86-88 OPAC, an hardware matrix floating-point coprocessor 1991: 300 ICs, a VLSI sequencer, ..

CAPS background in microarchitecture Solid knowledge in microprocessor architecture technological watch on microprocessors + research on processor architecture + A. Seznec worked at Alpha Development Group in 1999-2000: Defined the EV8 branch predictor + P. Michaud worked at Intel (2001-2002)

Background in memory hierarchy Interleaved memories for vector supercomputers (research): + A. Seznec participated at Tarantula project: vector extension to Compaq EV8 International CAPS visibility in cache architecture : skewed associative cache + decoupled sectored cache

Our expertise may help to define next machine in SCIPARC Bring pipeline definition expertise Bring memory hierarchy definition expertise Help to remain simple Help to enlarge possible application domains