I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, and Ke Yi Duke University University of Aarhus
The Union-Find Problem A universe of N elements: x 1, x 2, …, x N Initially N singleton sets: {x 1 }, {x 2 }, …, {x N } Each set has a representative Maintain the partition under –Union( x i, x j ) : Joins the sets containing x i and x j –Find( x i ) : Returns the representative of the set containing x i
The Solution d bja eg h fl n m i srczk p representatives d bja eg h fl n m Union(d, h) : link-by-rank d bja eg h fl n Find(n) : path compression m
Complexity O(N α(N)) for a sequence of N union and find operations [Tarjan 75] – α() : Inverse Ackermann function (very slow!) –Optimal in the worst case [Tarjan79, Fredman and Saks 89] Batched (Off-line) version –Entire sequence known in advance –Can be improved to linear on RAM [Gabow and Tarjan 85] –Not possible on a pointer machine [Tarjan79]
Simple and Good, as long as … The entire data structure fits in memory
The I/O Model Main memory of size M Disk of infinite size One I/O transfers B items between memory and disk
Our Results An I/O-efficient algorithm for the batched union-find problem using O(sort( N )) = O( N/B log M/B (N/B) ) I/Os expected –Same as sorting –optimal in the worst case A practical algorithm using O(sort( N ) log(N/M) ) I/Os Applications to terrain analysis –Topological persistence : O(sort( N )) I/Os –Contour trees : O(sort( N )) I/Os
I/O-Efficient Batched Union-Find Assumption: No redundant unions –Each union must join two different sets –Will remove later Two-stage algorithm –Convert to interval union-find Compute an order on the elements s.t. each union joins two adjacent sets –Solve batched interval union-find
Union Graph r ab cdef ghi 1: Union(d, g) 2: Union(a, c) 3: Union(r, b) 4: Union(a, e) 5: Union(e, i) 6: Union(r, a) 7: Union(a, d) g 8: Union(d, h) r 9: Union(b, f) r ab cde f g h i Equivalent union trees (Tree if no redundant unions)
Transforming the Union Tree r ab cdef ghi r ab cdef g h i r ab c d efg h i r ab c d e f g h i Weights along root-to-leaf path decrease
Formulating as a Batched Problem r ab cdef ghi r ab c d e f g h i For each edge, find the lowest ancestor edge with a higher weight
Cast in a Geometry Setting r ab cdef ghi Euler Tour In O(sort( N )) I/Os [Chiang et al. 95] x : positions in the tour y : weight
Cast in a Geometry Setting r ab cdef ghi For each edge, find the lowest ancestor edge with a higher weight For each segment, find the shortest segment above and containing it
Distribution Sweeping M/B vertical slabs checked here checked recursively Total cost: O(sort( N ))
In-Order Traversal r ab c d e f g h i Weights along root-to-leaf path decrease At u, with child u 1,…, u k (in increasing order of weight) 1.Recursively visit subtree at u 1 2.Return u 3.For i=2,…, k Recursively visit subtree at u i br 8 aceigdhf Claim: this traversal produces the right order
Solving Interval Union-Find Union: x : two operands y : time stamp Find: x : operand y : time stamp representative
Solving Interval Union-Find Union: x : two operands y : time stamp Find: x : operand y : time stamp Four instances of batched ray shooting: O(sort( N ))
Solving Interval Union-Find Union: x : two operands y : time stamp Find: x : operand y : time stamp Four instances of batched ray shooting: O(sort( N ))
Handling Redundant Unions Union tree becomes a general graph Compute the minimum spanning tree –O(sort( N )) I/Os (randomized) [Chiang et al. 95] O(sort( N ) loglog B ) I/Os (deterministic) [Arge et al. 04] –Deterministic O(sort( N )) I/Os if graph is planar –Only MST edges are non-redundant
Applications 1.Topological Persistence 2.Contour Trees
Application: Topological Persistence Introduced by Edelsbrunner et al Measure importance on a surface –Feature extraction –Topological de-noising Many applications –Surface modeling –Shape analysis –Terrain analysis –Computational Biology
Topological Persistence Illustrated
Formulated as Batched Union-Find Represented as a triangulated mesh Consider minimum-saddle pairs When reach –A minimum or maximum: do nothing –A regular point u : Issue union( u,v ) for a lower neighbor v –A saddle u : let v and w be nodes from u ’s two connected pieces in its lower link Issue: find( v ), find( w ), union( u,v ), union( u,w ) lower link
Experiment 1: Random Union-Find 128MB memory
Experiment 2: Topological Persistence on Terrain Data Neuse River Basin of North Carolina: ~ 0.5 billion points
Experiment 2: Topological Persistence on Terrain Data Entire data set (0.5b): IM fails and EM takes 10 hours 128MB memory
Contour Trees
Summary An I/O-efficient algorithm for the batched union-find problem using O(sort( N )) = O( N/B log M/B (N/B) ) I/Os –optimal in the worst case A practical algorithm using O(sort( N ) log(N/M) ) I/Os Applications to terrain analysis –Topological persistence : O(sort( N )) I/Os –Contour trees : O(sort( N )) I/Os Open Question: –On-line case: Can we get below O(N α(N)) I/Os?
Thank you!
Previous Results Directly maintain contours –O( N log N ) time [van Kreveld et al. 97] –Needs union-split-find for circular lists –Do not extend to higher dimensions Two sweeps by maintaining components, then merge –O( N log N ) time [Carr et al. 03] –Extend to arbitrary dimensions
Join Tree and Split Tree Join tree Split tree Qualified nodes Join tree Split tree
Final Contour Tree Join tree Split tree Contour tree Hard to BATCH!
Another Characterization Join tree Split tree Contour tree u v w u v w u u w Let w be the highest node that is a descendant of v in join tree and ancestor of u in split tree, (u, w) is a contour tree edge Now can BATCH!
Map to Rectangles Join tree Split tree u v w u v w u v w Can be solved in O(sort(N)) I/Os (practical, too)
Topological Persistence
Label Nodes with Intervals Using Euler tour (O(sort(N) I/Os)
Map to Rectangles Join tree Split tree u v w u v w u v w Can be solved in O(sort(N)) I/Os (practical, too)
Formulated as Batched Union-Find Represented as a triangulated mesh Consider minimum-saddle pairs When reach –A minimum or maximum: do nothing –A regular poin u : Issue union( u,v ) for a lower neighbor v –A saddle u : let v and w be nodes from u ’s two connected pieces in its lower link Issue: find( v ), find( w ), union( u,v ), union( u,w ) lower link
Experiment 1: Random Union-Find
Experiment 2: Topological Persistence on Terrain Data