1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.

1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008

2 Point Pattern Matching Given two point sets P, Q, find Q’  Q to minimize Dist(P, Q’) = min dist(tP, Q’) where t is a geometric transformation. (e.g., translation, rotation, …) P Q

3 Point Pattern Similarity Search S= A collection of point sets S={P 1,P 2,…,P N } has been preprocessed. Given a query set Q, find (approximate) nearest P i with respect to a distance function and transformation group. Q … … … … S = {P 1, P 2, …, P N }

4 Results TransformationSpaceIndexNote Geometric Hashing [Wolfson & Rigoutsos 97] Translation Rotation Affine … O(Nn k+1 ) (k: frame size) YESSpace complexity EMD M into Euclidean space [Indyk & Thaper03] NoneO(Nn)YESEmbedding EMD to L 1 EMD under transformation sets [Cohen & Guibas99] Scaling Translation O(Nn)NOBrute-force, Heuristic OursTranslationO(Nn log 2 n )YESEmbedding SD to L 1 EMD: Earth Mover’s Distance SD: Symmetric Difference Distance

5 Problem Definition : Point Pattern Similarity Searching: Distance Measure: Symmetric Difference Distance Error Model: Outliers (but No Noise) Transformation: Translation Restriction: Coordinates are integers P = {p1,p2,p3,p4} Q = {p1,p2,p5,p6} {0,12,14,23,35,54,59,64}P = {0,12,14,23,35,54,59,64}{ 12,14,23,35,54, 64}{12,14,17,23,35,54,62,64} t=3 {15,17,20,26,38,57,65,67}Q = Q P…… ……… ………

6 Motivation: Sources of Complexity Combination of Translation + Outliers Translation Only - translate the point set by aligning leftmost point to the origin - trivial matching Outliers Only - Reduce to Nearest neighbor search in Hamming cube (By hashing or random sampling)

7 Intuition P1P1 Q P2P2 P3P3 P4P4 PNPN f f f f f f Metric space

8 Embedding: Basic Definitions Given metric spaces (X, d) and (X', d'), a map f: X  X’ is called an embedding. The contraction of f is the maximum factor by which distances are shrunk, i.e., The expansion or stretch of f is the maximum factor by which distances are stretched: The distortion of f is the product of the contraction and expansion.

9 Main Result: Preliminaries Main result: There exists an randomized embedding that maps a point set under symmetric difference with respect to translation into a metric space L 1 with distortion O(log 2 n). Assumption: –Each point set has at most n elements and is in dimension d. –Coordinates are integers of magnitude polynomial in n Distance Function: Symmetric Difference with respect to translation = min |(P + t)ΔQ| Target Metric: L 1

10 Outline of Algorithm 1. Transform d-dimension points into 1-d dimension points. (Distortion: 1) 2. Reduce the domain size using a linear hash function. (Distortion: O(1)) 3. Make invariant under translation. (Distortion: O(log 2 n)) 4. Reduce the target domain size using a universal hash function. (Distortion: O(1)) {3,6,10,14,22} 10000001011 {101010,..., 010100, …, 11101} 30000201 O(nlogn)

11 Translation Invariant 10000001011100010 {1101, 0001,0000,0010,1100,1010} … ρ = 4 P = s

12 Intuition 1000000101011000010010 hP hQ Φ 2 Q={10,00,01,00,11,00,10,01,00,11,00} Φ 2 P={10,01,00,10,01,00,10,00,00,01,00} Φ 4 P={1101,0000,0010,1100,0000,0001, 1000,0010,0101,0000,0010} Φ 4 Q={1011,0100,0010,0101,1000,0011, 1100,0010,0100,1001,0000} s s If one of probes hits mismatched positions, then the bit patterns generated may differ. The probability that one of probes hits mismatched positions increases when the probe size increases.

13 Relationship between ρ (probe size) and δ* Unknown δ: estimated distance δ*: original distance Upper bound Expectation >2s-2 ??? s/2 i increases Distance of Invariants

14 Embedding ??? 1.5 2020 21212 …2L2L 2H2H 2 log 2n =2n… … Distance of Invariants δ: estimated distance δ*: original distance

15 Build Time The expensive operations are of building invariant and hashing for large domain. Building invariant : (# of Probes) * (# of Translations) Trivial: O(s) * s = O(n log n) * O(n log n) = O(n 2 log 2 n) Universal hash function: (# of Elements) * (Matrix operation) = (# of Elements) * (Input Size) * (Output Size) Trivial: O(s) * O(s) * O(log s) = O(s 2 log s) = O( n 2 log 3 n ) We can improve it to O( n log 3 n ) if we merge two operations. Surprise!!!

16 Merge Two Operations 11000010010 P= s 1 0 1 0 1 r0r0 H … y0y0 y1y1 y2y2 y s-1 … f … Convolution can be computed in O(n log n) where n is the size of array r log s

17 Main Result: Formal Statement Given failure probability β, there exists a randomized embedding from a point set P into a vector ΨP of dimension O(n (log 2 n) log(1/β)) such that for any P, Q This embedding can be computed in time O(n (log 4 n) log(1/β))

18 Open Problems Q1. Can we improve the distortion bound? currently O(log 2 n) Cormode & Muthukrishnan show how to embed a string under edit distance with moves into L 1 with O(log n log* n) distortion. Q2. Can we derandomize the algorithm? Cormode & Muthukrishnan’s algorithm is deterministic. Q3. Can we improve space/time complexities?

19 Other Extensions Q1. Can we support a distance measure (e.g., Hausdorff distance that is robust to noisy data)? Q2. Can we handle other transformation groups? - integer scaling? - integer scaling + translation? - affine transformations over finite vector spaces? : Point Pattern Similarity Searching: Distance Measure: Symmetric Difference Distance Error Model: Outliers (but No Noise) Transformation: Translation Restriction: Coordinates are integral

20 Thank You !

21 Translation Invariant 10000001011 P = {3,6,10,14,22} h(x) = x mod s (e.g. s = 11) 100010 {1101, 0001,0000,0010,1100,1010} … h’(x) : (for simplicity, x mod 10) 2000120100 Φ ρ P = {13,0,2,12,1,…,10} ρ = 4 Φ ρ P = hP = s 0000000000 0 1 2 34 56789

22 Trial 1: Geometric Hashing for Translation Naïve Version: - Space complexity is O( N n 2 ) since the frame size is 1. - With outliers in a query: # of queries will increase Adaptive Version: To reduce space complexity, if store only c transformed sets, then # of queries will increase. Outliers may lead a false matching, thus they will increase the prob. of the false positive.

23 Geometric Hashing with Outliers (delete) Based on the outliers $r$ and the frame size $k$, the number of queries will increase to get a correct result. method 1. Pr[ choose a valid frame set] = ( 1 – r/n )^k method 2. (r + 1) different trials ( deterministic) method 3. pigeonhole theorem. Pr[ choose a valid frame set] = 1-r/(n/k) [Grimson&Huttenlocher 90] : Outliers lead a false matching and increase the prob. of the false positive.

24 d-Dimension  1-Dimension Let u be the maximum coordinate value of each point. Then, we can map a d-dimensional point set to a 1-dimensional point set with coordinates of size at most (3u) d. without changing the symmetric difference distance under translation. 01010 00100 01000 01010…00100…01000… (1,1) (5,3) 135 [ 6,15 ] [ 21,30 ]

25 # of Primes & Collision Prob. Collision Probability h(x) = x mod s where s is a prime number in Θ (n log n) ( where s is chosen uniformly at random ) For x != y Pr[h(x) = h(y)] = Pr[(x mod s) = (y mod s)] = Pr[(x-y) mod s = 0] Since x, y Є Zn c, |x – y| < n c. Pr[h(x) = h(y)] < c/(# of primes) = 1/O(n) Prime Number Theorem There exist O(m/log m) prime numbers in range between 1 and m.

26 Distance Distortion by Hashing We can achieve o(1) distortion with the hash function which the probability of collision is 1/O(n). Note that the distance is always contracted due to collision.

27 Linear Hash Function (X) h(x) = x mod s where s is a prime number in Θ(n log n) Linearity h( x + t ) = h(x) + h(t) - translation ΦρP = Φρ(P+t) P = {3,6,10,14,22} 10000001011 S

28 Distance Distortion by Hashing (X) We can achieve o(1) distortion with the hash function which the probability of collision is 1/O(n). Note that the distance is always contracted due to collision.

29 Universal Hash Function for large domain Since the maximum probe size is O(n log n), the input domain of hash function is O(2 O(n log n) ). However, it has only θ(n log n) elements. H: 2 s  2 k H(x) = R x + b (mod (2,2,…,2)) R: a random k x s matrix b: k bits random row vector. Time Complexity: For compute a value : O( k s ) = O( (log n) n log n ) = O( n log 2 n ) For, all s (= O(n log n) ), the time is O( n 2 log 3 n ).

30 Relationship between ρ and δ* Unknown δ is a guess distance δ* is an optimal distance Upper bound Expectation >2s-2 ??? s/2 i

31 Effect of Hash Functions ??? h h’

32 Merge Two Operations using FFT & Convolution П = random_probe( ρ, s ) For t = 1, …., s, x(t) = (hP + t)[П] // make an invariant For t = 1, …, s. x’(t) = H x(t) + b ( mod (2,2,2,…,2) ) // H: O(log s) x ρ matrix ΦρP[x’(t)]++ Time Complexity: O(s) * O(matrix multi) = O( s ) * O(s log s) ------------------------------------------------------------------------ H = [r 1, r 2, …, r O(log s) ]’ // r i : a binary row bit vector Hx(t) = [ r 1 x(t), r 2 x(t), r 3 x(t), …, r O(logs) x(t)]’ r i x(t) = r i  (hP + t)[П] =  (hP + t)[П  r i ] [r i x(0), r i x(1), …, r i x(s)] = fliplr(hP)  [П  r i ] Time Complexity: O(log s) * O(convolution) = O( log s ) * O(s log s)

33 Build Time Trivial running timeOurs d-dimension -> 1-dimension O(dn) Linear HashingO(n) Invariant under Translation O(n^2 log^2 n) O( n log^3 n) Universal Hashing (due to the domain size, we need to use matrix multiplication ) O(n^2 log^4 n)

1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.

Similar presentations

Presentation on theme: "1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.

Similar presentations

Presentation on theme: "1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008."— Presentation transcript:

Similar presentations

About project

Feedback