Research: Past, Present and Future

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Code Transformations to Improve Memory Parallelism Vijay S. Pai and Sarita Adve MICRO-32, 1999.
2015/6/21\course\cpeg F\Topic-1.ppt1 CPEG 421/621 - Fall 2010 Topics I Fundamentals.
Processor Architectures and Program Mapping 5kk10 TU/e 2006 Henk Corporaal Jef van Meerbergen Bart Mesman.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, March 22, 2011 Branching.ppt Control Flow These notes will introduce scheduling control-flow.
Assets and Dynamics Computation for Virtual Worlds.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Compiler Research at the Indian Institute of Science Bangalore, India Y.N. Srikant Professor and Chairman Department of Computer Science and Automation.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Optimizing Loop Performance for Clustered VLIW Architectures by Yi Qian (Texas Instruments) Co-authors: Steve Carr (Michigan Technological University)
Hy-C A Compiler Retargetable for 2014 and beyond Philip Sweany 4/29/2014.
Computer Systems Research at UNT 1 A Multithreaded Architecture Krishna Kavi (with minor modifcations)
Automated Design of Custom Architecture Tulika Mitra
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Configurable, reconfigurable, and run-time reconfigurable computing.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/30/2013.
Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
Memory-Aware Compilation Philip Sweany 10/20/2011.
Compiler Research How I spent my last 22 summer vacations Philip Sweany.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
New Opportunities for Computer Architecture Research Using High-Density FPGAs and Design Tools Nahi Abdul-Ghani, Patrick Akl, Mohammad El-Majzoub, Maroulla.
ECE354 Embedded Systems Introduction C Andras Moritz.
EECE571R -- Harnessing Massively Parallel Processors ece
Ph.D. in Computer Science
Computer Architecture Principles Dr. Mike Frank
Basic CUDA Programming
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang
ECE 498AL Spring 2010 Lectures 8: Threading & Memory Hardware in G80
A Review of Processor Design Flow
Super Quick Architecture Review
Hardware/Software Co-Design
CSCI1600: Embedded and Real Time Software
Dynamically Reconfigurable Architectures: An Overview
Register Pressure Guided Unroll-and-Jam
Mattan Erez The University of Texas at Austin
The Vector-Thread Architecture
Mattan Erez The University of Texas at Austin
Department of Electrical Engineering Joint work with Jiong Luo
Mattan Erez The University of Texas at Austin
Mapping DSP algorithms to a general purpose out-of-order processor
NetPerL Seminar Hardware/Software Co-Design
Graphics Processing Unit
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
CSCI1600: Embedded and Real Time Software
Resource Replication 6 Integer Units 4 FP units 8 Sets of architectural registers Renaming registers (Int/FP) HW Context (PC, Return Stack.
CS Introduction to Operating Systems
Presentation transcript:

Research: Past, Present and Future Philip Sweany 10/27/06

Past Retargetable Compilers Register Assignment Instruction Scheduling

Retargetable Compilation Rocket C compiler, written in C++ Retargetable for ILP computers Single machine description file Development 1989-2000 Gnu Scale

Instruction Scheduling Local (Basic Block) Global --- Dominator-Path Scheduling Software Pipelining Heuristic search Genetic algorithms Simulated Annealing Integer-Linear programming

Register Assignment Early vs. Late CRAIG Clustering Our algorithm degrades 10% over ideal Next best degrades 19% over ideal

Paritioned Register Banks F1 F2 F3 F4 F5 F6 F7 F8 Register A Register B

Current --- Compilation Hybrid Architectures Multithreading Memory Optimization Scratch-pad memory Tradeoff cache, scratch-pad Architectural Support Function Reuse Split Cache

Hybrid Computing Tradeoffs of performance, power, flexibility Heterogeneous processors on single chip “CPU” FPGA ASIC N “CPU”s, M FPGAs, K ASICs Tradeoffs of performance, power, flexibility

Generic Hybrid Computer CPU 1 FPGA 1 FPGA 2 CPU 2 Shared Memory CPU m FPGA n Multi-CPU Multi-FPGA

Hy-C System Specification Source Code Partitioning CPU Compiler FPGA Power-Performance Model FPGA Power-Performance Model

Multithreading Identify threads from SSA SDF – Scheduled Dataflow Multithreaded Decouple memory access, execution Clusters for scalability

SDF Clusters SP SP … EP EP EP EP C0 Cn-1

Future Automatic Code Generation (I don’t believe in software) Visual Programming of Components