Trends toward Spatial Computing Architectures Dr. André DeHon BRASS Project University of California at Berkeley.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

TO COMPUTERS WITH BASIC CONCEPTS Lecturer: Mohamed-Nur Hussein Abdullahi Hame WEEK 1 M. Sc in CSE (Daffodil International University)
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Reconfigurable Computing CS294-6 Fall 1998 Dr. Andre DeHon.
Lecture 15: Reconfigurable Coprocessors October 31, 2013 ECE 636 Reconfigurable Computing Lecture 15 Reconfigurable Coprocessors.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day6: October 11, 2000 Instruction Taxonomy VLSI Scaling.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day20: November 29, 2000 Review.
CS294-6 Reconfigurable Computing Day 5 September 8, 1998 Comparing Computing Devices.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Introduction to Reconfigurable Computing CS61c sp06 Lecture (5/5/06) Hayden So.
CS294-6 Reconfigurable Computing Day 6 September 10, 1998 Comparing Computing Devices.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 9: February 7, 2007 Instruction Space Modeling.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
CS294-6 Reconfigurable Computing Day 26 Thursday, November 19 Integrating Processors and RC Arrays.
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
Embedded DRAM for a Reconfigurable Array S.Perissakis, Y.Joo 1, J.Ahn 1, A.DeHon, J.Wawrzynek University of California, Berkeley 1 LG Semicon Co., Ltd.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
Comparing Computing Machines Dr. André DeHon UC Berkeley November 3, 1998.
Chapter 6 Memory and Programmable Logic Devices
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Reconfigurable Computing. This Class is About Reconfigurable Computing Computer Architecture Coping with Change.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 12: May 15, 2001 Interfacing Heterogeneous Computational.
Reconfigurable Computing. Lect-02.2 Course Schedule Introduction to Reconfigurable Computing FPGA Technology, Architectures, and Applications FPGA Design.
Reconfigurable Devices Presentation for Advanced Digital Electronics (ECNG3011) by Calixte George.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: March 3, 2014 Instruction Space Modeling 1.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
CBSSS 2002: DeHon Costs André DeHon Wednesday, June 19, 2002.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Performed by: Guy Assedou Ofir Shimon Instructor: Yaniv Ben-Yitzhak המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day18: November 22, 2000 Control.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day7: October 16, 2000 Instruction Space (computing landscape)
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 8: January 27, 2003 Empirical Cost Comparisons.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #23 – Function.
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Reconfigurable Architectures Greg Stitt ECE Department University of Florida.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 28, 2005 Empirical Comparisons.
ESE534: Computer Organization
Christopher Han-Yu Chou Supervisor: Dr. Guy Lemieux
CS184a: Computer Architecture (Structure and Organization)
ESE534: Computer Organization
ESE534: Computer Organization
CprE / ComS 583 Reconfigurable Computing
CS184a: Computer Architecture (Structure and Organization)
ESE534: Computer Organization
ESE534: Computer Organization
Presentation transcript:

Trends toward Spatial Computing Architectures Dr. André DeHon BRASS Project University of California at Berkeley

è How do we build programmable VLSI computing devices in the era of G 2  T 2 silicon die capacity? (billion transistors) nCapacity available 1000  100,000  nOpens up architectural space nSpatial architectures become viable and beneficial

Back to Basics What is a computation? Y=Ax 2 +bx+c

Basics How do we implement a computation? –Perform operations –Communicate among operations

Implement Computation Perform operations –universal computational modules nand, ALU, Lookup-Table –specialized operators multiple, add, FP-divide Communicate among operations –spatially network –temporally memory

Implementation Choice in implementation : –How many compute elements? –How much sequentialization?

Serial Implementation Single Operator Reuse in time Store instructions Store intermediates Communication across time One cycle per operation

Spatial Implementation One operator for every operation Instruction per operator Communication in space Computation in single cycle

Some Numbers Binary Operator w/ Interconnect 500K  1M 2 –(e.g. ALU bit, LUT (gate), …) Instruction (include interconnect) 80K 2 Memory bit (SRAM) 1  2K 2  Fully Sequential: N  80K 2 + S  1K 2 +1M 2  Fully Spatial: N  1M 2 Ü Temporal N slower, 12  smaller

Programmable Device: 50M 2 Sample die: 7mm  7mm, 2.0  m Spatial: bit operators –2 32b addrs?, small bit-serial datapath? Sequential: 600+ instructions (data) –kernel on chip?

Programmable Device: 100G 2 16mm  16mm, 0.1  m Spatial: 100,000 bit operators –even bit parallel, can support kernels with 1000s of operators Sequential: 1.2M instructions (data) –entire applications (and data?) fit on chip

Density Advantage Why implement spatially? For these extremes, spatial has : –  operators/cycle 50M 2 –100,000  operators/cycle 100G 2 Conventional word architectures –32b  2-3  50M 2 –4  64b  400  100G 2

Empirical Raw Density Comparison

Spatial Advantages 10  raw density advantage over processors potential for fine-grained (bit-level) control  can offer another order of magnitude benefit versus SIMD/word architectures. Demonstrated on select applications With 1000’s of operators per chip today: –substantial problems fit spatially on die.

Spatial Drawbacks Lower instruction density –12  bit controlled extremes –12  32  400  where SIMD-word ops apply Unused (infrequently used) operators waste space when not in use

Example: FIR Filtering

Architecture Space Broad space between sequential and spatial extremes – 1  to  100,000 operators –Microprocessors: 4  64=256 Navigate space to design most efficient architectures

Computing Device Composition –Bit Processing elements –Interconnect space time –Instruction Memory

Compute Model Use model to estimate area implied by architectural parameters A bitop =A op +A instr (c,w)+ A interconnect (p,w,N)+A data (d) Use areas to compare density and efficiency Area(best matched architecture) Area(evaluation architecture) Efficiency =

Peak Densities from Model

Processors and FPGAs FPGA c=d=1, w=1, k=4 “Processor” c=d=1024, w=64, k=2

Hybrids: Processor+Array Example: UCB GARP –MIPS-II Core –array  memory access –on-chip config. cache – LUTs Also: PRISC, NAPA, OneChip, Chimera,...

Hybrids: Intermediates E.g. Multicontext FPGA: MIT DPGA –on-chip space for a few instructions –single cycle instruction switch

Conclusions Growth in silicon capacity makes spatial implementations viable Spatial implementations offer density (performance) advantage As silicon capacity grows –more problems “fit” spatially Richer architectural space available today –worth rethinking how we build programmable computing systems