Download presentation
Presentation is loading. Please wait.
1
Solving the Greatest Common Divisor Problem in Parallel Derrick Coetzee University of California, Berkeley CS 273, Fall 2010, Prof. Satish Rao
2
The problem and serial solutions Given two positive integers (represented in binary), find the large positive integer that divides both Let n will be the total number of bits in the inputs Solvable in O(n) divisions and O(n 2 ) bit operations with Euclidean algorithm: – while b ≠ 0: (a, b) ← (b, a mod b) Matrix formulation: Schönhage uses this to solve in O(n log 2 n log log n)
3
Brent & Kung (1982): reduce time of each step to O(1) (O(n) bit complexity) – α ← n; β ← n (where a,b ≤ 2 n and a odd) while b ≠ 0: while b ≡ 0 (mod 2) { b ← b/2; β ← β − 1 } if α ≥ β { swap(a,b); swap(α, β) } if a+b ≡ 0 (mod 4): b ← ⌊ (a+b)/2 ⌋ else: b ← ⌊ (a−b)/2 ⌋ – Test “a+b ≡ 0 (mod 4)” uses only 2 lowest bits of sum, enables effective pipelining of the steps – Number of steps is still linear, requires n processors Adding parallelism: Algorithm PM
4
Algorithm PM: Example GCD(105, 30) (a = 1101001 2, b = 11110 2 ) α ← 7; β ← 7 b ← 1111 2 ; β ← 6 swap (a ← 1111 2, b ← 1101001 2, α ← 6, β ← 7) a + b ≡ 11 2 + 01 2 ≡ 00 2 (mod 4) Compute trailing zeroes of new b and 2 more bits: – b ← ⌊ (a+b)/2 ⌋ = ( 01111 2 + …01001 2 ) /2 = … 1100 b ← …11 2 ; β ← 5 swap (a ← …11 2, b ← 1111 2, α ← 5, β ← 6)
5
Algorithm PM: Example a = …11 2, b = 1111 2, α = 5, β = 6 a + b ≡ 11 2 + 11 2 ≡ 10 2 (mod 4) Compute LSB of new b while previous iteration computes next bit of a in parallel: – b' = ⌊ (a−b)/2 ⌋ = …0 – a = …111 2 With another bit of a, can compute next bit of b’ Final result: a= 1111 2, b’ = 0
6
Sublinear time GCD Kannan et al (1987): Break the problem into n/log n steps, each of which takes log log n time and eliminates log n bits of each input. – CRCW, O(n log log n/log n) time, O(n 2 log 2 n) processors – Idea: instead of a mod b, compute pa mod b for all 0≤p≤n If gcd(a,b) = d and 0 ≤ p ≤ n, then gcd(pa,b)/d ≤ n Pigeonhole principle gives first log n bits same for at least two p – gcd(a,b) = gcd(b,(p 1 a mod b)-(p 2 a mod b)) (ignoring factors ≤ n) – Only need to know first log n bits to identify p 1, p 2 – can be computed in log log n time
7
Sublinear time GCD: Example GCD(143, 221), n = 8 – (0×143) mod 221 = 000… 2 – (1×143) mod 221 = 100… 2 – (2×143) mod 221 = 010… 2 – (3×143) mod 221 = 110… 2 – (4×143) mod 221 = 100… 2 – (5×143) mod 221 = 001… 2 – (6×143) mod 221 = 110… 2 – (7×143) mod 221 = 011… 2 – (8×143) mod 221 = 001… 2 (4×143) mod 221 − (1×143) mod 221 = 13
8
Sublinear time GCD: Using matrix form Problem: (p 1 a mod b) − (p 2 a mod b) takes log n time – Carry lookahead adder needs log n time to add n-bit numbers – Idea: Delay updates to (a, b) for log n steps, collecting updates in a 2 × 2 matrix T, then compute T(a,b) in log n time – Results in n/log 2 n phases each taking log n time – Each step performs constant number of log 2 n-bit operations Ensures that each step still takes log log n time – Time dominated by cost of the O(n/log n) steps
9
Getting rid of the log log n factor Chor and Goldreich (1990): Parallelize Brent & Kung’s PM algorithm instead of Euclidean algorithm – α ← n; β ← n while b ≠ 0: while b = 0 (mod 2) { b ← b/2; β ← β − 1 } if α ≥ β { swap(a,b); swap(α, β) } if a+b ≡ 0 (mod 4): b ← ⌊ (a+b)/2 ⌋ else: b ← ⌊ (a−b)/2 ⌋ – Idea: pack log n iterations of PM algorithm into each phase Each phase can be done in constant time – Requires O(n/log n) time on O(n 1+ε ) processors – Best-known algorithm
10
log n iterations in constant time – α ← n; β ← n δ ← 0 while b ≠ 0: while b = 0 (mod 2) { b ← b/2; β ← β − 1 δ ← δ + 1} if α ≥ β δ ≥ 0 { swap(a,b); swap(α, β) δ ← −δ } if a+b ≡ 0 (mod 4): b ← ⌊ (a+b)/2 ⌋ else: b ← ⌊ (a−b)/2 ⌋ – δ together with the last log n+1 bits of a and b determine the transformation matrix of the next log n iterations – Precompute a table – lookup, do the matrix multiplication Technicality: Need to treat | δ| ≤ log n and |δ| > log n differently
11
Multiplication in constant time How to multiply an n-bit number by a log n bit number in constant time? – Represent both numbers in base 2 log n – Precompute a multiplication table for this base – Separate the larger number into a sum of two numbers, each having zero for every other digit, e.g. 1234 = 1030 + 204 – Multiply both by the smaller number; no carries are required Example: 1234 × 7 = (1030+204)×7 = 7210+1428 = 8638 – Finally, Chandra et al (1983) shows we can add two n-bit numbers in O(1) time with O(n 2 ) processors But this requires concurrent writes
12
Complexity perspective: an analogy P = NP? – Open problem, but most practical problems have either been shown to be in P or else NP-hard. – Exceptions: integer factorization, graph isomorphism If integer factorization is NP-complete, NP = coNP If graph isomorphism is NP-complete, PH collapses to 2 nd level NC = P? – Open problem, but most practical problems have either been shown to be in NC or else P-hard. – Exceptions: GCD What would GCD being P-complete imply?
13
Questions?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.