Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing Without Processors Thesis Proposal Mihai Budiu July 30, 2001 This presentation uses TeXPoint by George Necula Thesis Committee: Seth Goldstein,

Similar presentations


Presentation on theme: "Computing Without Processors Thesis Proposal Mihai Budiu July 30, 2001 This presentation uses TeXPoint by George Necula Thesis Committee: Seth Goldstein,"— Presentation transcript:

1 Computing Without Processors Thesis Proposal Mihai Budiu July 30, 2001 This presentation uses TeXPoint by George Necula Thesis Committee: Seth Goldstein, chair Todd Mowry Peter Lee Babak Falsafi, ECE Nevin Heintze, Agere Systems

2 2 Four Types of Research Solve nonexistent problems Solve past problems Solve current problems Solve future problems

3 3 The Law (source: Intel)

4 4 The Crossover Phenomenon time technology

5 5 Example Crossover time DRAM CPU 1980 caches access speed (ns) no caches 200

6 Trouble Ahead for Microarchitecture

7 7 Signal Propagation time now mm die size distance in 1 clock 20

8 8 Reliability & Yield time defects/chip tolerable new process occurring now

9 9 Energy time now 100W CPU consumption thermal dissipation power

10 10 Instruction-Level Parallelism (ILP) time fetch commit instructions now

11 11 Premises of this Research We will have lots of gates –Moore’s law continues –Nanotechnology Contemporary architectures do not scale

12 12 Outline Motivation ASH: Application-Specific Hardware The spatial model of computation CASH: Compiling for ASH Evolutionary path Conclusions Future work

13 13 ASH Application-Specific Hardware Reconfigurable hardware HLL program Compiler Circuit

14 14 ASH: A Scalable Architecture -- Thesis Statement -- Application-specific hardware on a reconfigurable-hardware substrate is a solution for the smooth evolution of computer architecture. We can provide scalable compilers for translating high-level languages into hardware.

15 15 Example int f(void) { int i=0, j = 0; for (; i < 10; i++) j += i; return j; }

16 16 Outline Motivation ASH: Application-Specific Hardware The spatial model of computation CASH: Compiling for ASH Evolutionary path Conclusions Future work

17 17 Build reconfigurable hardware using nanotechnology Huge structures ASH and Nanotechnology Low Power: 10 10 gates use less than 2 W Low cost: nanocents/gate High density: 10 5 x over CMOS Nano-RAM cell In yellow: a CMOS RAM cell.

18 18 A graph of the whole program execution: A Limit Study of Performance Memory word Basic block Memory write Memory read Control-flow transfer

19 19 Typical Program Graph (g721_e) Control flow transfer 100% memory cluster Memory reads 100% code cluster memcpy

20 20 Program Graph After Inlining memcpy memcpy

21 21 Application Slowdown

22 22 How Time Is Spent No caches: reads expensive No speculation

23 23 Lesson The spatial model of computation has different properties.

24 24 Outline Motivation ASH: Application-Specific Hardware The spatial model of computation CASH: Compiling for ASH Evolutionary path Future work

25 25 CASH: Compiling for ASH Memory partitioning Interconnection net Program to circuits

26 26 Compilation 1. Program int reverse(int x) { int k,r=0; for (k=0; k > 1; r = r << 1; } } Unknown latency ops. Computations & local storage 2. Split-phase Abstract Machines 3. Configurations placed independently 4. Placement on chip Reliability

27 27 Split-phase Abstract Machines SAM 1 SAM 2 SAM 3 CFG Power

28 28 Hyperblock => SAM Single-entry, multiple exit May contain loops

29 29 SAM => FSM StartLoop Exit Remote Memory Local memory

30 30 Implementing SAMs - interesting details -

31 31 The SAM FSM Computation Predicates (control) Combinational logic start exit Register argsresults

32 32 Computation = Dataflow Variables => wires + tokens No token store; no token matching Local communication only Signals x = a & 7;... y = x >> 2; Programs & a 7 >> 2 x Circuits

33 33 Tokens & Synchronization Tokens signal operation completion Possible implementations: data valid ack Local data valid reset Global data valid Static

34 34 Speculation if (x > 0) y = -x; else y = b*x; * x  b0 y ! slow ComputationPredicates -> -> and Eager Muxes Static-Single Assignment implemented in hardware ILP

35 35 Predicates *q = 2; Guard side-effects –Memory access –Procedure calls Control looping Decide exit branch Select variable definition x=......=x

36 36 Computing Predicates Correct for irreducible graphs Correct even when speculatively computed Can be eagerly computed st b

37 37 Loops + Dataflow for (i=0; i < 10; i++) a[i] += i; + load + store &a[0] + 1 i a[0] 0 a[1] a[2] a[3] = Pipelining

38 38 Outline Motivation ASH: Application-Specific Hardware The spatial model of computation CASH: Compiling for ASH Evolutionary path Conclusions Future work

39 39 Evolutionary Path MicroprocessorsASH The problem with ASH: Resources

40 40 Virtualization

41 41 CPU+ASH core computation support computation + OS + VM CPUASH Memory

42 42 Outline Motivation ASH: Application-Specific Hardware The spatial model of computation CASH: Compiling for ASH Evolutionary path Conclusions Future work

43 43 ASH Benefits ProblemSolution ReliabilityConfiguration around defects PowerOnly “useful” gates switching SignalsLocalized computation ILPStatically extracted

44 44 Scalable Performance performance CPU ASH time now

45 45 Summary Contemporary CPU architecture faces lots of problems Application-Specific Hardware (ASH) provides a scalable technology Compiling HLL into hardware dataflow machines is an effective solution

46 46 Timeline 12/0206/01 CASH core 09/0112/0104/0206/0209/02 Write thesis Hw/sw partitioning (ASH + CPU) Cost models ASH Simulation Loop parallelization Explore architectural/compiler trade-offs now Memory partitioning

47 47 Extras Related work Reconfigurable hardware Other cross-over phenomena A CPU + ASH study More about predicates

48 48 Related Work Hardware synthesis from HLL Reconfigurable hardware Predicated execution Dataflow machines Speculative execution Predicated SSA back

49 49 Reconfigurable Hardware Universal gates and/or storage elements Interconnection network Programmable Switches backback to presentation

50 50 Switch controlled by a 1-bit RAM cell 00010001 Universal gate = RAM a0 a1 a0 a1 data a1 & a2 0 data in control Main RH Ingredient: RAM Cell back

51 51 Reconfigurable Computing Back to ENIAC-style computation Synthesize one machine to solve one problem back back to “extras”

52 52 Efficiency time idle used hardware resources now

53 53 Manufacturing Cost time 3x10 9 $ now cost affordable cost

54 54 Complexity time transistors manageable available 10 9 10 8 10 now

55 55 CAD Tools time manual interventions now feasible necessary back

56 56 ASH Benefits ProblemSolution ReliabilityConfiguration around defects PowerOnly “useful” gates switching SignalsLocalized computation ILPStatically extracted ComplexityHierarchy of abstractions CADCompiler + local place & route EfficiencyCircuit customized to application CostNo masks, no physics, same substrate PerformanceScalable back

57 57 CPU+ASH Study Reconfigurable functional unit on processor pipeline Adapted SimpleScalar 3.0 ASH & CPU use the same memory hierarchy (incl. L1) ASH can access CPU registers CPU pipeline interlocked with ASH Results pending back

58 58 Simplifying Predicates Shared implementations Control equivalence a b c

59 59 Deep Speculation if (p) if (q) x = a; else x = b; else x = c; x abc !pp&!qp&q

60 60 Predicates & Tokens *q = 2 ready safe q ~x ready safe x *q = 2 1 ready & safe q Predicated tokensEliminate speculation ~x safe & readyx back ready Eliminate wires PP_ready P & P_ready


Download ppt "Computing Without Processors Thesis Proposal Mihai Budiu July 30, 2001 This presentation uses TeXPoint by George Necula Thesis Committee: Seth Goldstein,"

Similar presentations


Ads by Google