 # Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.

## Presentation on theme: "Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz."— Presentation transcript:

Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz

Terminology Graph – G(V,E) DAG – Directed Acyclic Graph Chain – A path Antichain – subset of V, A, such that there is no path between two nodes of A.

Terminology - cont Width of a DAG – the largest cardinality of an antichain in the DAG A set of chains is said to cover a DAG if every node in the DAG is a node in at least one of the chains. Dilworth’s theorem states that the width of a graph equals to the minimum number of covering chains. Source – A node without arcs from it. Sink – A node without arcs pointing to it.

The model Set of workers – {1,2,…,n} denoted by [n]. Set of tasks – {1,2,…,t} denoted by [t]. The tasks has mutual dependencies which is represented by a DAG where every node is a task and every directed edge u→v represents task v depends upon completion of task u. Task t is eligible if all the tasks it’s depends on was successfully executed. We’ll denote by E(X) the set of all eligible tasks when all the tasks in set X had executed.

The model - cont A set of tasks satisfies precedence constraints if, for every task in the set, all parents of the task are also in the set. The execution is consist of synchronous steps. In each of those steps, a central daemon assign a task from the group E(X)  { ∅ } to every worker. A worker i, completes task j successfully in a certain step with probability of P i,j. The execution completes when every task was executed successfully by at least one worker.

Regimen definition Formally, the set theory representation of a regimen is: Σ : 2 |V| → ([n] → (V  { ∅ })) For any subset Z of tasks that satisfies precedence constraints, the value Σ(Z) is a function from [n] to the set E(Z)  { ∅ }

ROPAS Recomputation and Overbooking allowed Probabilistic dAg Scheduling Problem Given such DAG, G, where for every task j there exist at least one i such that P i,j > 0, find a regimen ∑* which minimize the expected completion time.

Lemma 2.1  Denote by B X, The minimum expected completion across all regimen which start with tasks X already executed. Lemma 2.1: For any set X of tasks that satisfies precedence constraints, B X is finite. Proof: We define a regimen that has finite expectation; thus minimum expectation must be finite, too. We take a topological sort t 1,..., t m of the subdag of G induced by tasks other than X. The probability that every worker fails to execute task t j is: The expected value to execution:

Optimized algorithm for restricted ROPAS We show an dynamic programming algorithms which solves the ROPAS problem in polynomial time under the following restrictions: 1. The number of workers is at most constant. 2. The DAG width is at most constant. Later, we show that if we drop any of those restrictions, the problem will become NP complete.

Theorem 3.1 Definitions:  X - Set of tasks that satisfies precedence constraints and does not contain all sinks of G.  D 0, D 1,..., D k - All the distinct subsets of E(X) where D 0 = ∅.  a i - The probability that D i is exactly the set of tasks executed by workers in the assignment Σ(X).  Xi = X  D i.  T X i - The expected time to completion for regimen Σ starting with tasks X i already executed. Then the following relation holds:

Theorem 3.1 - Proof T - The time to completion for the regimen that starts with tasks X already executed. T = T L +T R, where T L is the time until at least one task of E(X) has been executed, and T R is the rest of the time.  Exp[T] = Exp[T L ] + Exp[T R ] T L ~G(P), where P = 1-a 0 = a 1 +a 2 +…+a k.  Exp[T L ] = 1/P = 1/(a 1 +a 2 +…+a k )

Calculating Exp[T R ] – The remaining time to execution Let D i be the event: “D i is exactly the set of tasks first executed by the workers”.

Theorem 3.1 - conclusions

Admissible evolution of execution graph For every DAG G we can construct a graph G’(V’,E’) called “admissible evolution of execution” which has single source and single sink. Every node in G’ represents a set of nodes (or tasks) from G. This special graph is also a DAG. Every execution flow of every ROPAS regimen on G can be represented by a path from the source to the sink of this graph.

Admissible evolution of execution graph – Construction The construction of this graph is inductive:  We begin with a set V’={ ∅ }.  For any node X  V’ that doesn’t contain all sinks of G calculate the set of eligible tasks E(X) in G.  Take all non-empty subsets D  E(X) and add to V’ the node X  D if it is not there already.  Add an arc (X, X  D) to V’. Since G is finite, the inductive process clearly defines a unique directed graph G’.

Admissible evolution of execution graph – Properties G’ is a DAG – it cannot has any cycle because every arc is in the from (X, X  D) and X has strictly more elements than X  D. Nodes of G’ are exactly the sets of tasks that satisfy precedence constraints  => By induction – if X satisfy precedence constraints so does X  D.  <= Pick any subset Y of tasks that satisfies precedence constraints. Let t 1,..., t |Y | be a topological sort of the sub- DAG of G induced by Y. Clearly, t j belongs to the set of tasks eligible when tasks t 1,..., t j−1 have been executed, for any j. So if {t 1,..., t j−1 } is a node of G’, so is {t 1,..., t j }. Since ∅ is a node of G’, Y must also be.

Admissible evolution of execution graph Properties – cont’d G’ has a single source - We add a node Y to G’ only if there is an arc leading to Y from some other node. So Y cannot be a source. However, ∅ is a source because no arc leads to a set with the same number or fewer elements. G’ has a single sink – For any X  V’, E(X)  ∅ except X=V.

The algorithm OPT When processing each X  V’ define two values: A number S X - The expected time to completion for regimen Σ OPT starting with tasks X already executed. An assignment Σ OPT (X). Pick a topological sort Y 1,..., Y m of G’ and process it in the reverse order. Start by setting S y m to 0 and Σ OPT (Y m ) to ∅. For any 1  h<m consider all possible (|E(Y h )|+1) n assignment to n workers:  For any assignment, calculate the probability a i that D i is exactly the set of tasks executed by workers in the assignment.  Find the assignment which minimizes the expression:  Set S Y h to the value of this expression and Σ OPT (Y h ) to this assignment.

Theorem 3.3 The regimen Σ OPT ‘s expected time to completion when starting with tasks X already executed is the minimal expected time to completion that any regimen can achieve (denoted by B X ).

Theorem 3.3 - Proof The following invariant holds during the reverse order processing of V’:  For all i such that h ≤ i ≤ m, S Y i = B Y i and S Y i is equal to the expected time to completion of Σ OPT that starts with tasks Y i already executed. The invariant is clearly true for i=m because G’ has exactly one sink. By induction, assume the invariant holds for 2≤i≤m (i.e. S Y i = B Y i ). According to theorem 3.1: And it’s exactly equals to S Y i

Theorem 3.4 The OPT algorithm works at polynomial time when the DAG’s width and the workers number are at most constants. Proof: Let the width of the graph be w and number of workers n.  By Dilworth’s theorem, the DAG can be covered by w chains, each chain can have a length of at most t so there are at most (t+1) w distinct subsets of tasks that satisfy precedence constraints.  In each step, the algorithm consider assignment of at most t+1 tasks to each worker. So there are at most (t+1) n calculations in each step.  We only need to evaluate probabilities a i for at most 2 n of the sets D i, because the workers can get assigned to at most n different tasks, The computations step of OPT is at most C 1 (t+1) w (t+1) n 2 n +C 2, thus polynomial.

Part II NP hardness

X3C - Exact 3-Set cover Instance: F = {S 1, …, S m } of subsets of a finite set U, and |S i | = 3, |U| = 3n for some n. Objective: Find n sets in F that are disjoint and have U as their union. The problem is a sub-problem of the exact set cover and thus NP hard.

MP – Multiplicative partition Instance: Number n, integer sizes s 1,..., s n where s i ≥ 2. Question: Is there a set P such that: The MP problem is NP hard. We show a polynomial time reduction from X3C.

MP is NP hard Given an instance of X3C, we associate consecutive prime numbers with each element of the universe (U). Each 3-subset is converted into a size that is the product of the associated primes. We add two more sizes:  C which is p 1 a1-2 * p 2 a2-2 * …p k ak-2 where a i is the number of occurrences of p i and p i occur more than once.  D which is pi1 * p i2 * p i3 * … * p in where p ij appear once.

MP ≥ X3C - example S4S4 S1S1 S2S2 S3S3 S5S5 =2 =3 =5 =7 =11 =13

MP ≥ X3C - example S 1 =2*3*13=78 S 2 =2*5*11=110 S 3 =5*7*11=385 S 4 =3*5*11=165 S 5 =2*7*11=154 C=2 1 *3 0 *5 1 *7 0 *11 2 =1210 D=13 2*3*13 * 5*7*11 * 2 1 *3 0 *5 1 *7 0 *11 2 = 2*5*11 * 3*5*11 * 2*7*11 * 13

Partition with Minimum Sum of Products (PMSP) Instance: Number n, rational numbers: 0 < r 1,..., r n < 1. Objective: Find a set P that minimizes: Claim: PMSP is NP hard: Proof: Pick any instance of MP with integer s 1, …, s n. Look at the instance of PMSP where r i =1/s i 2.

PMSP is NP hard The function e x +e d-x in the domain {0≤x≤d} is minimized exactly when x=d/2 and it’s value is 2e d/2. s 1, …, s n can be partitioned to two sets of equal multiplication iff: Thus if we would know the answer to PMSP in polynomial time – we’ll know the answer to MP in polynomial time.

Theorem 4.3 The ROPAS Problem restricted to the DAG with two independent tasks (where the number of workers may grow) is NP-hard. Proof: Polynomial time reduction from PMSP:  Consider a DAG with only two independent tasks: right task and left task.  Each worker executes a task successfully with probability of p i.  Assume we split the tasks to N workers where l is the probability that at least one of the workers assigned to the left task will succeed and r is same probability regarding the right task.

Theorem 4.3 – cont. By theorem 3.1, the expected time to completion: the regimen initially assigns every worker to a task, and when one task remains to be executed, the regimen assigns all workers to the task. So T {left} =T {right} =T and we get: Since L and R partition {1,..., n}, then 1-(1-l)(1-r) is just the probability that at least one of the n workers succeeds and is independent of how workers are assigned to left and right task. Thus the expectation is minimized if, and only if, lr is maximized across partitions L and R of {1,..., n}.

Theorem 4.3 – cont. Since L and R is a partition – The value in the 3 rd term is independent of R and L. So lr is minimized iff is minimized.

FIRMSSU Problem Fixed Ratio Many Subsets with Small Union Instance: Number k, nonempty subsets S 1,..., S 3k of the set {1,..., 3k} whose union is {1,..., 3k}. Question: Are there 2k of these subsets whose union has cardinality at most 2k?

ROPAS ≥FIRMSSU Tasks A 1, B 1 can only be executed by the first worker. Tasks A 2, B 2,C 2 can only be executed by the second worker.

Round 2/3n 2 (24)

Round 2/3n 2 +2/3n (28)

Round n 2 +2/3n (40)

Round n 2 +n (42)

Suppose the instance of FIRMSSU is negative… Pick any regimen. For the reason that the instance is negative, the union of any 2k subsets has cardinality at least 2k + 1. But the cardinality of any group is n, so at least 2/3n 2 +n sources of A 1 must be executed in order to have 2/3n sinks of B 2 eligible. Therefore, at most 2/3n−1 sinks of B2 can get executed by the end of round 2/3n 2 + 2/3n. By the end of this round, no sinks of C 2 can be executed, because each depends on execution of 2/3n 2 +2/3n other tasks. Hence, by the end of that round, there are at least 1/3n+1+1/3n 2 unexecuted tasks that can only be executed by the second worker. As a result the regimen takes at least n 2 + n + 1 rounds to complete computation.

ROPAS is inaproximable When there are no restriction on both the DAG width and the number of workers - ROPAS is inaproximable with a factor better than 5/4, unless P=NP. Similar proof like the former. Assume there are 2/3n 2 workers of type 1 and 2/3n 2 workers of type 2. The next DAG can be scheduled in 4 rounds iff an instance of FIRMSSU is positive.

Round 1

Round 3

Round 4