Computing Without Processors Thesis Proposal Mihai Budiu July 30, 2001 This presentation uses TeXPoint by George Necula Thesis Committee: Seth Goldstein,

Slides:



Advertisements
Similar presentations
Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.
Advertisements

Mihai Budiu Microsoft Research – Silicon Valley joint work with Girish Venkataramani, Tiberiu Chelcea, Seth Copen Goldstein Carnegie Mellon University.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Fall 2001CS 4471 CS 447: Fall 2001 Chapter 1: Computer Abstraction and Technology (Introduction to the course)
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Spatial Computation Computing without General-Purpose Processors Mihai Budiu Carnegie Mellon University July 8, 2004.
Spring 08, Jan 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Nanotechnology: Spatial Computing Using Molecular Electronics Mihai Budiu joint work with Seth Copen Goldstein Dan Rosewater.
Peer-to-peer Hardware-Software Interfaces for Reconfigurable Fabrics Mihai Budiu Mahim Mishra Ashwin Bharambe Seth Copen Goldstein Carnegie Mellon University.
Compiling Application-Specific Hardware Mihai Budiu Seth Copen Goldstein Carnegie Mellon University.
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
Application-Specific Hardware Computing Without Processors Mihai Budiu October 6, 2001 SOCS-4.
Multiscalar processors
Spatial Computation Mihai Budiu CMU CS CALCM Seminar, Oct 21, 2003.
SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
ASH: A Substrate for Scalable Architectures Mihai Budiu Seth Copen Goldstein CALCM Seminar, March 19, 2002.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Low Power Techniques in Processor Design
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
CAD for Physical Design of VLSI Circuits
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.
ECE 465 Introduction to CPLDs and FPGAs Shantanu Dutt ECE Dept. University of Illinois at Chicago Acknowledgement: Extracted from lecture notes of Dr.
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
1 Moore’s Law in Microprocessors Pentium® proc P Year Transistors.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
UNIT 1 Introduction. 1-2 OutlineOutline n Course Topics n Microelectronics n Design Styles n Design Domains and Levels of Abstractions n Digital System.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Morgan Kaufmann Publishers
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
February 12, 1999 Architecture and Circuits: 1 Interconnect-Oriented Architecture and Circuits William J. Dally Computer Systems Laboratory Stanford University.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
ECE 551: Digital System Design & Synthesis Motivation and Introduction Lectures Set 1 (3 Lectures)
1 November 11, 2015 A Massively Parallel, Hybrid Dataflow/von Neumann Architecture Yoav Etsion November 11, 2015.
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.
Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.
Microprocessor Design Process
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
Spring 2003CSE P5481 WaveScalar and the WaveCache Steven Swanson Ken Michelson Mark Oskin Tom Anderson Susan Eggers University of Washington.
Computers’ Basic Organization
Multiscalar Processors
Architecture & Organization 1
Antonia Zhai, Christopher B. Colohan,
Architecture & Organization 1
Chapter 1 Introduction.
HIGH LEVEL SYNTHESIS.
Computer Evolution and Performance
COMS 361 Computer Organization
Presentation transcript:

Computing Without Processors Thesis Proposal Mihai Budiu July 30, 2001 This presentation uses TeXPoint by George Necula Thesis Committee: Seth Goldstein, chair Todd Mowry Peter Lee Babak Falsafi, ECE Nevin Heintze, Agere Systems

2 Four Types of Research Solve nonexistent problems Solve past problems Solve current problems Solve future problems

3 The Law (source: Intel)

4 The Crossover Phenomenon time technology

5 Example Crossover time DRAM CPU 1980 caches access speed (ns) no caches 200

Trouble Ahead for Microarchitecture

7 Signal Propagation time now mm die size distance in 1 clock 20

8 Reliability & Yield time defects/chip tolerable new process occurring now

9 Energy time now 100W CPU consumption thermal dissipation power

10 Instruction-Level Parallelism (ILP) time fetch commit instructions now

11 Premises of this Research We will have lots of gates –Moore’s law continues –Nanotechnology Contemporary architectures do not scale

12 Outline Motivation ASH: Application-Specific Hardware The spatial model of computation CASH: Compiling for ASH Evolutionary path Conclusions Future work

13 ASH Application-Specific Hardware Reconfigurable hardware HLL program Compiler Circuit

14 ASH: A Scalable Architecture -- Thesis Statement -- Application-specific hardware on a reconfigurable-hardware substrate is a solution for the smooth evolution of computer architecture. We can provide scalable compilers for translating high-level languages into hardware.

15 Example int f(void) { int i=0, j = 0; for (; i < 10; i++) j += i; return j; }

16 Outline Motivation ASH: Application-Specific Hardware The spatial model of computation CASH: Compiling for ASH Evolutionary path Conclusions Future work

17 Build reconfigurable hardware using nanotechnology Huge structures ASH and Nanotechnology Low Power: gates use less than 2 W Low cost: nanocents/gate High density: 10 5 x over CMOS Nano-RAM cell In yellow: a CMOS RAM cell.

18 A graph of the whole program execution: A Limit Study of Performance Memory word Basic block Memory write Memory read Control-flow transfer

19 Typical Program Graph (g721_e) Control flow transfer 100% memory cluster Memory reads 100% code cluster memcpy

20 Program Graph After Inlining memcpy memcpy

21 Application Slowdown

22 How Time Is Spent No caches: reads expensive No speculation

23 Lesson The spatial model of computation has different properties.

24 Outline Motivation ASH: Application-Specific Hardware The spatial model of computation CASH: Compiling for ASH Evolutionary path Future work

25 CASH: Compiling for ASH Memory partitioning Interconnection net Program to circuits

26 Compilation 1. Program int reverse(int x) { int k,r=0; for (k=0; k > 1; r = r << 1; } } Unknown latency ops. Computations & local storage 2. Split-phase Abstract Machines 3. Configurations placed independently 4. Placement on chip Reliability

27 Split-phase Abstract Machines SAM 1 SAM 2 SAM 3 CFG Power

28 Hyperblock => SAM Single-entry, multiple exit May contain loops

29 SAM => FSM StartLoop Exit Remote Memory Local memory

30 Implementing SAMs - interesting details -

31 The SAM FSM Computation Predicates (control) Combinational logic start exit Register argsresults

32 Computation = Dataflow Variables => wires + tokens No token store; no token matching Local communication only Signals x = a & 7;... y = x >> 2; Programs & a 7 >> 2 x Circuits

33 Tokens & Synchronization Tokens signal operation completion Possible implementations: data valid ack Local data valid reset Global data valid Static

34 Speculation if (x > 0) y = -x; else y = b*x; * x  b0 y ! slow ComputationPredicates -> -> and Eager Muxes Static-Single Assignment implemented in hardware ILP

35 Predicates *q = 2; Guard side-effects –Memory access –Procedure calls Control looping Decide exit branch Select variable definition x=......=x

36 Computing Predicates Correct for irreducible graphs Correct even when speculatively computed Can be eagerly computed st b

37 Loops + Dataflow for (i=0; i < 10; i++) a[i] += i; + load + store &a[0] + 1 i a[0] 0 a[1] a[2] a[3] = Pipelining

38 Outline Motivation ASH: Application-Specific Hardware The spatial model of computation CASH: Compiling for ASH Evolutionary path Conclusions Future work

39 Evolutionary Path MicroprocessorsASH The problem with ASH: Resources

40 Virtualization

41 CPU+ASH core computation support computation + OS + VM CPUASH Memory

42 Outline Motivation ASH: Application-Specific Hardware The spatial model of computation CASH: Compiling for ASH Evolutionary path Conclusions Future work

43 ASH Benefits ProblemSolution ReliabilityConfiguration around defects PowerOnly “useful” gates switching SignalsLocalized computation ILPStatically extracted

44 Scalable Performance performance CPU ASH time now

45 Summary Contemporary CPU architecture faces lots of problems Application-Specific Hardware (ASH) provides a scalable technology Compiling HLL into hardware dataflow machines is an effective solution

46 Timeline 12/0206/01 CASH core 09/0112/0104/0206/0209/02 Write thesis Hw/sw partitioning (ASH + CPU) Cost models ASH Simulation Loop parallelization Explore architectural/compiler trade-offs now Memory partitioning

47 Extras Related work Reconfigurable hardware Other cross-over phenomena A CPU + ASH study More about predicates

48 Related Work Hardware synthesis from HLL Reconfigurable hardware Predicated execution Dataflow machines Speculative execution Predicated SSA back

49 Reconfigurable Hardware Universal gates and/or storage elements Interconnection network Programmable Switches backback to presentation

50 Switch controlled by a 1-bit RAM cell Universal gate = RAM a0 a1 a0 a1 data a1 & a2 0 data in control Main RH Ingredient: RAM Cell back

51 Reconfigurable Computing Back to ENIAC-style computation Synthesize one machine to solve one problem back back to “extras”

52 Efficiency time idle used hardware resources now

53 Manufacturing Cost time 3x10 9 $ now cost affordable cost

54 Complexity time transistors manageable available now

55 CAD Tools time manual interventions now feasible necessary back

56 ASH Benefits ProblemSolution ReliabilityConfiguration around defects PowerOnly “useful” gates switching SignalsLocalized computation ILPStatically extracted ComplexityHierarchy of abstractions CADCompiler + local place & route EfficiencyCircuit customized to application CostNo masks, no physics, same substrate PerformanceScalable back

57 CPU+ASH Study Reconfigurable functional unit on processor pipeline Adapted SimpleScalar 3.0 ASH & CPU use the same memory hierarchy (incl. L1) ASH can access CPU registers CPU pipeline interlocked with ASH Results pending back

58 Simplifying Predicates Shared implementations Control equivalence a b c

59 Deep Speculation if (p) if (q) x = a; else x = b; else x = c; x abc !pp&!qp&q

60 Predicates & Tokens *q = 2 ready safe q ~x ready safe x *q = 2 1 ready & safe q Predicated tokensEliminate speculation ~x safe & readyx back ready Eliminate wires PP_ready P & P_ready