cse 521: design and analysis of algorithms

Slides:



Advertisements
Similar presentations
Spatio-temporal Databases
Advertisements

Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Comments We consider in this topic a large class of related problems that deal with proximity of points in the plane. We will: 1.Define some proximity.
Analysis of Algorithms
Searching on Multi-Dimensional Data
Cse 521: design and analysis of algorithms Time & place T, Th pm in CSE 203 People Prof: James Lee TA: Thach Nguyen Book.
Lectures on Recursive Algorithms1 COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski.
Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 1 (Part 1) Introduction/Overview Tuesday, 9/4/01.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2005 Lecture 1 (Part 1) Introduction/Overview Tuesday, 1/25/05.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Lecture 1 (Part 1) Introduction/Overview Tuesday, 9/3/02.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Review Lecture Tuesday, 12/10/02.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2004 Lecture 1 (Part 1) Introduction/Overview Wednesday, 9/8/04.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2002 Lecture 1 (Part 1) Introduction/Overview Tuesday, 1/29/02.
CSE 220 (Data Structures and Analysis of Algorithms) Instructor: Saswati Sarkar T.A. Dimosthenes Anthomelidis
AALG, lecture 11, © Simonas Šaltenis, Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees and.
Algorithms on negatively curved spaces James R. Lee University of Washington Robert Krauthgamer IBM Research (Almaden) TexPoint fonts used in EMF. Read.
Orthogonal Range Searching I Range Trees. Range Searching S = set of geometric objects Q = query object Report/Count objects in S that intersect Q Query.
Lecture 2 We have given O(n 3 ), O(n 2 ), O(nlogn) algorithms for the max sub-range problem. This time, a linear time algorithm! The idea is as follows:
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Advanced Algorithm Design and Analysis (Lecture 13) SW5 fall 2004 Simonas Šaltenis E1-215b
Algorithms  Al-Khwarizmi, arab mathematician, 8 th century  Wrote a book: al-kitab… from which the word Algebra comes  Oldest algorithm: Euclidian algorithm.
Télécom 2A – Algo Complexity (1) Time Complexity and the divide and conquer strategy Or : how to measure algorithm run-time And : design efficient algorithms.
1 Closest Pair of Points (from “Algorithm Design” by J.Kleinberg and E.Tardos) Closest pair. Given n points in the plane, find a pair with smallest Euclidean.
The Traveling Salesman Problem Over Seventy Years of Research, and a Million in Cash Presented by Vladimir Coxall.
CSE 421 Algorithms Richard Anderson Lecture 27 NP-Completeness and course wrap up.
1 BIM304: Algorithm Design Time: Friday 9-12am Location: B4 Instructor: Cuneyt Akinlar Grading –2 Midterms – 20% and 30% respectively –Final – 30% –Projects.
Clustering Prof. Ramin Zabih
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
Course Review Fundamental Structures of Computer Science Margaret Reid-Miller 29 April 2004.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Review Lecture Tuesday, 12/11/01.
CSCI 256 Data Structures and Algorithm Analysis Lecture 10 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.
1 COMP9007 – Algorithms Course page: + Blackboard link Lecturer: M.Reza Hoseiny M.Reza Hoseiny Level.
Lecture 1 (Part 1) Introduction/Overview Tuesday, 9/9/08
MA/CSSE 473 Day 05 Factors and Primes Recursive division algorithm.
Introduction to Algorithms
Haim Kaplan and Uri Zwick
Greedy Algorithms / Interval Scheduling Yin Tat Lee
Definition In simple terms, an algorithm is a series of instructions to solve a problem (complete a task) We focus on Deterministic Algorithms Under the.
Design and Analysis of Algorithms (07 Credits / 4 hours per week)
CS 3343: Analysis of Algorithms
Lecture 7: Dynamic sampling Dimension Reduction
Divide and Conquer / Closest Pair of Points Yin Tat Lee
Randomized Algorithms CS648
Chapter 5 Divide and Conquer
Punya Biswas Lecture 15 Closest Pair, Multiplication
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Locality Sensitive Hashing
Algorithms and Data Structures Lecture XIII
Autumn 2015 Lecture 10 Minimum Spanning Trees
Divide and Conquer / Closest Pair of Points Yin Tat Lee
Richard Anderson Lecture 10 Minimum Spanning Trees
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Embedding Metrics into Geometric Spaces
CS5112: Algorithms and Data Structures for Applications
Lecture 15: Least Square Regression Metric Embeddings
CSCI 256 Data Structures and Algorithm Analysis Lecture 12
Lecture 15, Winter 2019 Closest Pair, Multiplication
The Intrinsic Dimension of Metric Spaces
Clustering.
President’s Day Lecture: Advanced Nearest Neighbor Search
Lecture 15 Closest Pair, Multiplication
PRESENTED BY Dr.U.KARUPPIAH DEPARTMENT OF MATHEMATICS
Hierarchical Routing in Networks with Bounded Doubling Dimension
Design and Analysis of Algorithms (04 Credits / 4 hours per week)
Richard Anderson Lecture 13 Divide and Conquer
Presentation transcript:

cse 521: design and analysis of algorithms Time & place T, Th 1200-120 pm in CSE 203 People Prof: James Lee (jrl@cs) TA: Thach Nguyen (ncthach@cs) Book Algorithm Design by Kleinberg and Tardos Grading 50% homework (approx. bi-weekly problem sets) 20% take-home midterm 30% in-class final exam Website: http://www.cs.washington.edu/521/

something a little bit different assume you know: asymptotic analysis basic probability basic linear algebra dynamic programming recursion / divide-and-conquer graph traversal (BFS, DFS, shortest paths) so that we can cover: nearest-neighbor search spectral algorithms (e.g. pagerank) online algorithms (multiplicative update) geometric hashing + graph algorithms, data structures, network flow, hashing, NP-completeness, linear programming, approx. algorithms due: april 9th

case study: nearest-neighbor search subset of objects forms the database Goal: Quickly respond with the database object most similar to the query. query AAACAACTTC CGTAAGTATA ridiculousse universe of objects

formal model subset of objects forms the database Goal: Quickly respond with the database object most similar to the query. U = universe (set of objects) query d(x,y) = distance between two objects Assumptions: d(x,x) = 0 for all x 2 U d(x,y) = d(y,x) for all x,y 2 U (symmetry) Problem: Given an input database D µ U: preprocess D (fast, space efficiently) so that queries q 2 U can be answered very quickly, i.e. return a* 2 D such that d(q,a*) = min { d(q, x) : x 2 D } d(x,y) · d(x,z) + d(z,y) for all x,y,z 2 U (triangle inequality)

other considerations subset of objects forms the database Is it expensive to compute d(x,y)? AAACAACTTC CGTAAGTATA query ridiculousse Should we treat the distance function and objects as a black box? Problem: Given an input database D µ U: preprocess D (fast, space efficiently) so that queries q 2 U can be answered very quickly, i.e. return a* 2 D such that d(q,a*) = min { d(q, x) : x 2 D }

primitive methods query [Brute force: Time] Compute d(query, x) for every object x 2 D, and return the closest. Takes time ¼ |D| ¢ (distance comp. time) query [Brute force: Space] Pre-compute best response to every possible query q 2 U. Takes space ¼ |U| ¢ (object size) Dream performance: query time space

something hard, something easy

something hard, something easy

something hard, something easy 1.0 0.9 1.0 1.0 1.0 1.0 All pairwise distances are equal: d(x,y) = 1 for all x,y 2 D “curse of dimensionality” Problem: … so that queries q 2 U can be answerd quickly, i.e. return a* 2 D such that d(q,a*) = min { d(q, x) : x 2 D }

something hard, something easy 1.0 0.9 1.0 1.0 1.0 1.0 All pairwise distances are equal: d(x,y) = 1 for all x,y 2 D “curse of dimensionality” Can sometimes solve exact NNS by first finding a good approximation ²-Problem: … so that queries q 2 U can be answerd quickly, i.e. return a 2 D such that d(q,a) · (1+²) d(q,D)

something easier Let’s suppose that U = [0,1] (real numbers between 0 and 1). 1 Answer: Sort the points D µ U in the preprocessing stage. To answer a query q 2 U, we can just do binary search. To support insertions/deletions in time, can use a BST. (balanced search tree) How much power did we need? Can we do this just using distance computations d(x,y)? (for x,y 2 D) Basic idea: Make progress by throwing “a lot” of stuff away.

extending the basic idea ® Definition: The ball of radius ® around x 2 D is

extending the basic idea ® 2® Definition: The ball of radius ® around x 2 D is

extending the basic idea Greedy construction algorithm: Start with N = ;. As long as there exists an x 2 D with d(x,N) > ®, add x to N. So for every ® > 0, we can construct an ®-net N(®) in O(n2) time, where n = |D|. Definition: An ®-net in D is a subset N µ D such that 1) Separation: For all x,y 2 N, d(x,y) ¸ ® 2) Covering: For all x 2 D, d(x,N) · ®

basic data structure: hierarchical nets N(2i+1) N(2i) with pointers between the levels … N(2) N(1) Search algorithm: Use the ®-net to find a radius 2® ball. Use the ®/2-net to find a radius ® ball. Use the ®/4-net to find a radius ®/2 ball.

basic data structure: hierarchical nets

algorithm: traverse the nets Data structure: Algorithm: Given input query q 2 U,

algorithm: traverse the nets Algorithm: Given input query q 2 U,

Nearly uniform point set: running time analysis? Query time = Nearly uniform point set: For u,v 2 Lx,i, d(u,v) 2 [2i-1, 2i+2] Algorithm: Given input query q 2 U,

“curse of dimensionality” curs’ed hamsters 1.0 0.9 1.0 1.0 1.0 1.0 All pairwise distances are equal: d(x,y) = 1 for all x,y 2 D “curse of dimensionality”

intrinsic dimensionality Given a metric space (X,d), let ¸(X,d) be the smallest constant ¸ such that every ball in X can be covered by ¸ balls of half the radius. r  = 5 The intrinsic dimension of (X,d) is the value

intrinsic dimensionality We can bound the query time of our algorithm in terms of the intrinsic dimension of the data… Query time = Claim: |Lx,i| · [¸(X,d)]3 Proof: Suppose that k = |Lx,i|. Then we need at least k balls of radius 2i-2 to cover B(x,2i+1), because a ball of radius 2i-2 can cover at most one point of Ni-1. But now we claim that (for any r) every ball of radius r in X can be covered by at most [¸(X,d)]3 balls of radius r/8, hence k · [¸(X,d)]3.

intrinsic dimensionality But now we claim that (for any r) every ball of radius r in X can be covered by at most [¸(X,d)]3 balls of radius r/8, hence k · [¸(X,d)]3. r A ball of radius r can be covered ¸ balls of radius r/2, hence by ¸2 balls of radius r/4, hence by ¸3 balls of radius r/8.

intrinsic dimensionality We can bound the query time of our algorithm in terms of the intrinsic dimension of the data… Query time = Claim: |Lx,i| · [¸(X,d)]3 Proof: Suppose that k = |Lx,i|. Then we need at least k balls of radius 2i-2 to cover B(x,2i+1), because a ball of radius 2i-2 can cover at most one point of Ni-1. But now we claim that (for any r) every ball of radius r in X can be covered by at most [¸(X,d)]3 balls of radius r/8, hence k · [¸(X,d)]3.

intrinsic dimensionality We can bound the query time of our algorithm in terms of the intrinsic dimension of the data… Query time = Claim: |Lx,i| · [¸(X,d)]3 Proof: Suppose that k = |Lx,i|. Then we need at least k balls of radius 2i-2 to cover B(x,2i+1), because a ball of radius 2i-2 can cover at most one point of Ni-1. But now we claim that (for any r) every ball of radius r in X can be covered by at most [¸(X,d)]3 balls of radius r/8, hence k · [¸(X,d)]3.

intrinsic dimensionality We can bound the query time of our algorithm in terms of the intrinsic dimension of the data… Query time = Generalization of binary search (where dimension = 1) Works in arbitrary metric spaces with small intrinsic dimension Didn’t have to think in order to “index” our database Shows that the hardest part of nearest-neighbor search is 0.9 1.0

wrap up Only gives approximation to the nearest neighbor Next time: Fix this; fix time, fix space + data structure prowess In the future: Opening the black box; NNS in high dimensional spaces Bonus: Algorithm is completely intrinsic (e.g. isomap)