Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Alviso Rick McGeer (HP) Erik Rubow.

Similar presentations


Presentation on theme: "© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Alviso Rick McGeer (HP) Erik Rubow."— Presentation transcript:

1

2 © 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Alviso Rick McGeer (HP) Erik Rubow (Ericsson) Stephen Lonergan (U Vic) Amin Vahdat (UCSD)

3 Outline Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Alviso: A Synchronous Language Restricted combinational communication −Motivation −The Mutex statement −Strict priority on processes −Recovering maximal parallelism Status and Conclusions

4 Alviso Motivation Make NetFPGA programming accessible to network designers NetFPGA: FPGA-based 4-port switch board Key to building high-speed software-defined networks Typical NetFPGA designer knows software routers, not VLSI design

5 NetFPGA Basic building block is a Xilinix FPGA (Virtex-II) Programming tools are Verilog (simulator-based HDL), Synopsys/Cadence/Xilinx synthesis tools for FPGA Problems −Verilog very low-level design tool −Many details of hardware design must be mastered by designer −No high-level network-based design environment Why is this interesting? −Software on GP processors still can’t keep up with modern switching equipment −Modern high-performance software routers require very substantial hardware (typically, GPGPUs)

6 Outline Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Alviso: A Synchronous Language Restricted combinational communication −Motivation −The Mutex statement −Strict priority on processes −Recovering maximal parallelism Status and Conclusions

7 Alviso C-like language whose modules can be easily realized as either hardware or software Some restrictions −No memory allocation −No functions or recursion −Built-in parallelism −No forks

8 Alviso Elements Module: Basic unit of design −Roughly equivalent to an Object in software design −Container of processes (see below) and variables −No shared variables across module boundaries! −All communications into/out of modules through “ports” (similar to software parameters) −Exactly equivalent to hardware ports

9 Alviso Elements Process: Basic element of computation Roughly equivalent to a thread in software Equivalent to a block of logic in hardware Begins immediately on load Runs to completion

10 Alviso Elements Port: Variable explicitly written or read by a process in a module Sole means of communication into/out of a module Equivalent to a hardware port −Always latched (see below) Roughly equivalent (in software) to a public object variable with a get/set method (read = get, write = set)

11 Outline Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Alviso: A Synchronous Language Restricted combinational communication −Motivation −The Mutex statement −Strict priority on processes −Recovering maximal parallelism Status and Conclusions

12 The Problem of Parallel Design The central assumption of design: the finite state model of computation Every variable is a little FSM −Quiescent unless explicitly perturbed by an instruction −But parallel design breaks this model for shared variables proc thread1() { x=2; x=x+1; } x = 3…right? x=2; x=x+1; proc thread2() { x=x*100; } Value of x is indeterminate

13 All the Problems in Parallel Design Break Down into solving this How do we recover a semantically-consistent deterministic model of design with efficient communications? A key to efficient multicore programming, hardware/software codesign,…. There are other problems, but without solving this one they are all built on a house of sand…

14 Historical Answer: Restrict Communications Problem is fundamentally one of communication Unrestricted asynchronous communication breaks design model Solution 1: No shared variables between threads −Inefficent: effectively, every thread is in its own address space Solution 2: Locks and semaphores: restrict ability of other threads to play with state during computation −Deadlock! −Locks themselves become a nondeterministic, asynchronous communication channel….

15 Requirements of our Solution Semantics independent of external systems (e.g., a thread scheduler) Efficient communication between threads Designs fully implementable in either hardware or software Module behavior identical independent of hardware or software realization – semantics independent of implementation −A caveat: mixed hardware/software systems will vary in behavior, depending on mix of hardware/software components −Hardware components are much faster than software components

16 Outline Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Synchronous Languages A Practical Realization Restricted combinational communication −Motivation −The Mutex statement −Strict priority on processes −Recovering maximal parallelism Status and Conclusions

17 Lessons From Hardware Hardware Design is… −Highly parallel −Efficient −Deterministic −Independent of mysteries such as thread scheduling…. How did those guys do that? −And, more to the point, how can we?

18 Classic Hardware Design Banks of acyclic “combinational” logic, separated by clocked latches Logic Latch Logic Latch Data flows unidirectionally in logic, latches update at clock edge

19 Means… Acyclic logic: logic banks compute in fixed time – length of longest path through the circuit Latches update only on clock edges: value of logic inputs stable during computation Computation divided into “cycles” of fixed length: no communication between logic blocks during computation

20 Mapping Alviso to Hardware Logic Latch Logic Latch Process(es)Variables Ports

21 Outline Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Alviso: A Synchronous Language Restricted combinational communication −Motivation −The Mutex statement −Strict priority on processes −Recovering maximal parallelism Status and Conclusions

22 Adapting Hardware to Languages Shared Variables == Latches Logic Blocks == Threads Threads run for fixed block of time, then “wait” for next cycle of computation Shared variables only update when all threads are waiting No interrupts, no locks, no semaphores….

23 Alviso Synchronous/Reactive Language −Computation in “zero” time, communication takes time “one” −Means: no communication while computing −Follows: Esterel, Lustre, ReactiveC, Signal, V++, SMV C-like syntax Major new innovation: “wait” statement “wait”: halt computation and wait for variables to update Each thread must execute a wait statement within a fixed period of time Means: each cyclic computation graph (aka, loop) must contain a wait statement

24 A quick example proc thread1 { x = 2; while(true) { x++; wait; } proc thread2 { wait; while(true) { x <<= 1; wait; } x = 3 x = 4 But what about after the wait?

25 Answer: Deterministic Priority What happens with conflicts on shared variable updates? −No effect on computation: updates only visible after wait −But x can only have one value…which should we choose? −Answer: priority. Processes have deterministic priority (total order on processes). In the event of conflict, higher-priority process wins

26 Alviso Computational Graph wait statements lead to a computation graph that is a forest of DAGs −Roots of the DAGs: initial statements of processes and statements immediately following wait statements −Leaves: final statements of processes and wait statements Computation terminates at a leaf on each cycle Starts on the next cycle at the subsequent root Computation in cycle is traversal of the DAG from root to leaf

27 Outline Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Alviso: A Synchronous Language Restricted combinational communication −Motivation −The Mutex statement −Strict priority on processes −Recovering maximal parallelism Status and Conclusions

28 Interprocess Zero-Delay Signaling Sometimes, you just have to break the rules Occasionally, processes need to signal each other in the same cycle −To gain exclusive access to a shared variable, for example −Multi-cycle locking too inefficient Almost every S/R language eventually incorporates some form of zero-delay interprocess signaling −Exceptions: V++, ReactiveC −Almost always makes hash of the semantics −Question: How can we do interprocess zero-delay signaling without making a mess?

29 Answer: Go Back to Hardware Zero-delay signaling is OK: what makes a mess is zero-delay loops −Hardware: run zero-delay wires in only one direction −Software: impose a priority order on processes High-priority processes execute “first” Higher-priority processes can signal lower-priority processors (but not vice-versa) Concrete realization: Mutex

30 Mutex Single-bit shared variable −Two states: “locked” and “unlocked” mutex foo; If (foo.lock()) { …execute guarded code… } lock() operation − Only succeeds (returns 1) if mutex is unlocked − Prevents any subsequent lock on foo from succeeding until unlock() is executed − unlock() releases lock at the beginning of next cycle − So, e.g., if (foo.lock()) foo.unlock() holds lock for this cycle

31 Implementing Mutex Safely Hardware: no issue −Arrange blocks of logic corresponding to processes in priority order −Mutex signals flow from high-priority to low-priority process −Arbitration on variable write works the same way Software: same idea −Run processes in priority order −High-priority processes run before lower-priority processes −Mutex locks in high-priority process automatically visible to lower-priority process −But price is very high: conceptually, serialized a parallel computation

32 Recovering Parallelism With Mutexes Recall: Each process defines a forest of DAGS −Call each such DAG a fiber Each Mutex defines a partial order among fibers −F A > F B iff Fiber A and Fiber B both lock Mutex F A is higher priority than B At every cycle, exactly one fiber per process will run −For this cycle, choose any schedule consistent with partial orders on runnable fibers −Optimization: locked mutexes don’t affect schedule (all lock operations in cycle will fail, and only succesful locks introduce dependency) −Therefore: disregard partial orders imposed by locked mutexes

33 Outline Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules Alviso: A Synchronous Language Restricted combinational communication −Motivation −The Mutex statement −Strict priority on processes −Recovering maximal parallelism Status and Conclusions

34 Alviso Status And Conclusion Hardware synthesis chain written and tested on a few sample designs −Need for zero-delay intermodule communication noted −Arbitration on memory interface Software interpreter written and tested XML intermediate form under development Planned first release April 2011 Is it perfect? Far from it… −Need users to help us figure out how to make it better −Contact: erik.rubow@ericsson.com, rick.mcgeer@hp.comerik.rubow@ericsson.comrick.mcgeer@hp.com

35


Download ppt "© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Alviso Rick McGeer (HP) Erik Rubow."

Similar presentations


Ads by Google