Solving the Greatest Common Divisor Problem in Parallel Derrick Coetzee University of California, Berkeley CS 273, Fall 2010, Prof. Satish Rao.

Solving the Greatest Common Divisor Problem in Parallel Derrick Coetzee University of California, Berkeley CS 273, Fall 2010, Prof. Satish Rao

The problem and serial solutions Given two positive integers (represented in binary), find the large positive integer that divides both Let n will be the total number of bits in the inputs Solvable in O(n) divisions and O(n 2 ) bit operations with Euclidean algorithm: – while b ≠ 0: (a, b) ← (b, a mod b) Matrix formulation: Schönhage uses this to solve in O(n log 2 n log log n)

Brent & Kung (1982): reduce time of each step to O(1) (O(n) bit complexity) – α ← n; β ← n (where a,b ≤ 2 n and a odd) while b ≠ 0: while b ≡ 0 (mod 2) { b ← b/2; β ← β − 1 } if α ≥ β { swap(a,b); swap(α, β) } if a+b ≡ 0 (mod 4): b ← ⌊ (a+b)/2 ⌋ else: b ← ⌊ (a−b)/2 ⌋ – Test “a+b ≡ 0 (mod 4)” uses only 2 lowest bits of sum, enables effective pipelining of the steps – Number of steps is still linear, requires n processors Adding parallelism: Algorithm PM

Algorithm PM: Example GCD(105, 30) (a = 1101001 2, b = 11110 2 ) α ← 7; β ← 7 b ← 1111 2 ; β ← 6 swap (a ← 1111 2, b ← 1101001 2, α ← 6, β ← 7) a + b ≡ 11 2 + 01 2 ≡ 00 2 (mod 4) Compute trailing zeroes of new b and 2 more bits: – b ← ⌊ (a+b)/2 ⌋ = ( 01111 2 + …01001 2 ) /2 = … 1100 b ← …11 2 ; β ← 5 swap (a ← …11 2, b ← 1111 2, α ← 5, β ← 6)

Algorithm PM: Example a = …11 2, b = 1111 2, α = 5, β = 6 a + b ≡ 11 2 + 11 2 ≡ 10 2 (mod 4) Compute LSB of new b while previous iteration computes next bit of a in parallel: – b' = ⌊ (a−b)/2 ⌋ = …0 – a = …111 2 With another bit of a, can compute next bit of b’ Final result: a= 1111 2, b’ = 0

Sublinear time GCD Kannan et al (1987): Break the problem into n/log n steps, each of which takes log log n time and eliminates log n bits of each input. – CRCW, O(n log log n/log n) time, O(n 2 log 2 n) processors – Idea: instead of a mod b, compute pa mod b for all 0≤p≤n If gcd(a,b) = d and 0 ≤ p ≤ n, then gcd(pa,b)/d ≤ n Pigeonhole principle gives first log n bits same for at least two p – gcd(a,b) = gcd(b,(p 1 a mod b)-(p 2 a mod b)) (ignoring factors ≤ n) – Only need to know first log n bits to identify p 1, p 2 – can be computed in log log n time

Sublinear time GCD: Example GCD(143, 221), n = 8 – (0×143) mod 221 = 000… 2 – (1×143) mod 221 = 100… 2 – (2×143) mod 221 = 010… 2 – (3×143) mod 221 = 110… 2 – (4×143) mod 221 = 100… 2 – (5×143) mod 221 = 001… 2 – (6×143) mod 221 = 110… 2 – (7×143) mod 221 = 011… 2 – (8×143) mod 221 = 001… 2 (4×143) mod 221 − (1×143) mod 221 = 13

Sublinear time GCD: Using matrix form Problem: (p 1 a mod b) − (p 2 a mod b) takes log n time – Carry lookahead adder needs log n time to add n-bit numbers – Idea: Delay updates to (a, b) for log n steps, collecting updates in a 2 × 2 matrix T, then compute T(a,b) in log n time – Results in n/log 2 n phases each taking log n time – Each step performs constant number of log 2 n-bit operations Ensures that each step still takes log log n time – Time dominated by cost of the O(n/log n) steps

Getting rid of the log log n factor Chor and Goldreich (1990): Parallelize Brent & Kung’s PM algorithm instead of Euclidean algorithm – α ← n; β ← n while b ≠ 0: while b = 0 (mod 2) { b ← b/2; β ← β − 1 } if α ≥ β { swap(a,b); swap(α, β) } if a+b ≡ 0 (mod 4): b ← ⌊ (a+b)/2 ⌋ else: b ← ⌊ (a−b)/2 ⌋ – Idea: pack log n iterations of PM algorithm into each phase Each phase can be done in constant time – Requires O(n/log n) time on O(n 1+ε ) processors – Best-known algorithm

log n iterations in constant time – α ← n; β ← n δ ← 0 while b ≠ 0: while b = 0 (mod 2) { b ← b/2; β ← β − 1 δ ← δ + 1} if α ≥ β δ ≥ 0 { swap(a,b); swap(α, β) δ ← −δ } if a+b ≡ 0 (mod 4): b ← ⌊ (a+b)/2 ⌋ else: b ← ⌊ (a−b)/2 ⌋ – δ together with the last log n+1 bits of a and b determine the transformation matrix of the next log n iterations – Precompute a table – lookup, do the matrix multiplication Technicality: Need to treat | δ| ≤ log n and |δ| > log n differently

Multiplication in constant time How to multiply an n-bit number by a log n bit number in constant time? – Represent both numbers in base 2 log n – Precompute a multiplication table for this base – Separate the larger number into a sum of two numbers, each having zero for every other digit, e.g. 1234 = 1030 + 204 – Multiply both by the smaller number; no carries are required Example: 1234 × 7 = (1030+204)×7 = 7210+1428 = 8638 – Finally, Chandra et al (1983) shows we can add two n-bit numbers in O(1) time with O(n 2 ) processors But this requires concurrent writes

Complexity perspective: an analogy P = NP? – Open problem, but most practical problems have either been shown to be in P or else NP-hard. – Exceptions: integer factorization, graph isomorphism If integer factorization is NP-complete, NP = coNP If graph isomorphism is NP-complete, PH collapses to 2 nd level NC = P? – Open problem, but most practical problems have either been shown to be in NC or else P-hard. – Exceptions: GCD What would GCD being P-complete imply?

Questions?

Solving the Greatest Common Divisor Problem in Parallel Derrick Coetzee University of California, Berkeley CS 273, Fall 2010, Prof. Satish Rao.

Similar presentations

Presentation on theme: "Solving the Greatest Common Divisor Problem in Parallel Derrick Coetzee University of California, Berkeley CS 273, Fall 2010, Prof. Satish Rao."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Solving the Greatest Common Divisor Problem in Parallel Derrick Coetzee University of California, Berkeley CS 273, Fall 2010, Prof. Satish Rao.

Similar presentations

Presentation on theme: "Solving the Greatest Common Divisor Problem in Parallel Derrick Coetzee University of California, Berkeley CS 273, Fall 2010, Prof. Satish Rao."— Presentation transcript:

Similar presentations

About project

Feedback