Download presentation

1
**U of Houston – Clear Lake**

Adaptive Latency-Aware Parallel Resource Mapping: Task Graph Scheduling Heterogeneous Network Topology Liwen Shih, Ph.D. Computer Engineering U of Houston – Clear Lake

2
**ADAPTIVE PARALLEL TASK TO NETWORK TOPOLOGY MAPPING**

Latency-adaptive: Topology Traffic Bandwidth Workload System hierarchy Thread partition: Coarse Medium Fine

3
**Fine-Grained Mapping System [Shih 1988]**

Parallel Mapping Compiler- vs. run- time Task migration Vertical vs. Horizontal Domain decomposition Data vs. Function Execution order Eager data-driven vs. Lazy demand-driven

4
**PRIORITIZE TASK DFG NODES**

Task priority factors: Level depth Critical Paths In/Out degree Data flow partial order: {(n7n5), (n7n4), (n6n4), (n6n3), (n5n1), (n4n2), (n3n2), (n2n1)} total task priority order: {n1 > n2 > n4 > n3 > n5 > n6 > n7} P2 thread: {n1>n2>n4>n3>n6} P3 thread: {n5 > n7}

5
**SHORTEST-PATH NETWORK ROUTING**

Shortest latency and routes are updated after each task-processor allocation.

6
**Adaptive A* Parallel Processor Scheduler**

Given a directed, acyclic task DFG G(V, E) with task vertex set V connected by data-flow edge set E, And a processor network topology N(P , C) with processor node set P connected by channel link set C Find a processor assignment and schedule S: V(G) P (N) S minimizes total parallel computation time of G. A* Heuristic mapping reduces scheduling complexity from NP to P

7
**Demand-Driven Task-Topology mapping**

STEP 1 – assign a level to each task node vertex in G. STEP 2 – count critical paths passing through each DFG edge and node with a 2-pass bottom-up and then up-down graph traversal. STEP 3 – initially load and prioritize all deepest level task nodes that produce outputs, to the working task node list. STEP 4 – WHILE working task node list is not empty, schedule a best processor to the top priority task, and replace it with its parent task nodes inserted onto the working task node priority list.

8
**Demand-Driven Processor Scheduling**

STEP 4 – WHILE working task node list is not empty: BEGIN STEP 4.1 – initialize if first time, otherwise update inter-processor shortest-path latency/routing table pair affected by last task-processor allocation. STEP 4.2 – assign a nearby capable processor to minimize thread computation time for the highest priority task node at the top of the remaining prioritized working list. STEP 4.3 – remove the newly scheduled task node, and replace it with its parent nodes, which are to be inserted/appended onto the working list (demand-driven) per priority, based on tie-breaker rules, which along with node level depth, estimate the time cost of the entire computation tread involved. END{WHILE}

9
**QUANTIFY SW/HW MAPPING QUALITY**

Example 1 – Latency-Adaptive Tree-Task to Tree-Machine Mapping Example 2 – Scaling to Larger Tree-to-Tree Mapping Example 3 – Select the Best Processor Topology Match for an Irregular Task Graph

10
**Example 1 – Latency-Adaptive Tree-Task to Tree-Machine Mapping**

K-th Largest Selection Will tree Algorithm [3] match tree machine [4]?

11
**Example 1 – Latency-Adaptive Tree-Task to Tree-Machine Mapping**

Adaptive mapping moves toward sequential processing when inter/intra communication latency ratio increase.

12
**Example 1 – Latency-Adaptive Tree-Task to Tree-Machine Mapping**

Adaptive Mapper allocates fewer processors and channels with fewer hops.

13
**Example 1 – Latency-Adaptive Tree-Task to Tree-Machine Mapping**

Adaptive Mapper achieves higher speedups consistently. (Bonus! pipeline processing speedup and be extrapolated when inter/intra communication latency ratio <1)

14
**Example 1 – Latency-Adaptive Tree-Task to Tree-Machine Mapping**

Adaptive Mapper results in better efficiencies consistently. (Bonus! % pipeline processing efficiency can be extrapolated when inter/intra communication latency ratio <1)

15
**Example 2 – Scaling to Larger Tree-to-Tree Mapping**

Adaptive Mapper achieves sub-optimal speedups as tree sizes scaled larger speedups, still trailing fixed tree-to-tree mapping closely.

16
**Example 2 – Scaling to Larger Tree-to-Tree Mapping**

Adaptive Mapper is always more cost-efficient using less resource, with compatible sub-optimal speedups to fixed tree-to-tree mapping as tree sizes scaled.

17
**Example 3 – Select the Best Processor Topology Match for an Irregular Task Graph**

Lack of matching topology clues for irregular shaped Robot Elbow Manipulator [5] 105 task nodes, 161 data flow edges 29 node levels

18
**Example 3 – Select the Best Processor Topology Match for an Irregular Task Graph**

Candidate topologies Compare schedules for each topology Farther processors may not be selected Linear Array Tree

19
**Example 3 – Select the Best Processor Topology Match for an Irregular Task Graph**

Best network topology performers (# channels) Complete (28) Mesh (12) Chordal ring (16) Systolic array (16) Cube (12)

20
**Example 3 – Select the Best Processor Topology Match for an Irregular Task Graph**

Fewer processors selected for higher diameter networks Tree Linear Array

21
**Example 3 – Select the Best Processor Topology Match for an Irregular Task Graph**

Deducing network switch hops Low multi-hop data exchanges < 10% Moderate 0-hop of 30% to 50% High near-neighbor direct 1-hop 50% to 70%

22
**Future Speed/Memory/Power Optimization**

Latency-adaptive Topology Traffic Bandwidth Workload System hierarchy Thread partition Coarse Mid Fine Latency/Routing tables Neighborhood Network hierarchy Worm-hole Dynamic mobile network routing Bandwidth Heterogeneous system Algorithm-specific network topology

23
References

24
**Professor in Computer Engineering University of Houston – Clear Lake**

Q & A? Liwen Shih, Ph.D. Professor in Computer Engineering University of Houston – Clear Lake

25
xScale13 paper

27
Thank You!

Similar presentations

OK

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on carbon and its compounds in chemistry Ppt on rayleigh fading channel Ppt on wireless integrated network sensors Ppt on ministry of corporate affairs new delhi Download ppt on multimedia and animation Ppt on synthesis and degradation of purines and pyrimidines paired Ppt on gsm based home automation Ppt on swine flu in india Ppt on types of soil conservation Ppt on earth movements and major landforms in canada