Efficient classification for metric data Lee-Ad GottliebWeizmann Institute Aryeh KontorovichBen Gurion U. Robert KrauthgamerWeizmann Institute TexPoint.

Slides:



Advertisements
Similar presentations
A Fast PTAS for k-Means Clustering
Advertisements

Computational Learning Theory
1/15 Agnostically learning halfspaces FOCS /15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )
Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,
Principles of Density Estimation
Weighted Matching-Algorithms, Hamiltonian Cycles and TSP
1 Bart Jansen Polynomial Kernels for Hard Problems on Disk Graphs Accepted for presentation at SWAT 2010.
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Vertex sparsifiers: New results from old techniques (and some open questions) Robert Krauthgamer (Weizmann Institute) Joint work with Matthias Englert,
Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.
1/22 Worst and Best-Case Coverage in Sensor Networks Seapahn Meguerdichian, Farinaz Koushanfar, Miodrag Potkonjak, and Mani Srivastava IEEE TRANSACTIONS.
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
all-pairs shortest paths in undirected graphs
A. S. Morse Yale University University of Minnesota June 4, 2014 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A.
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Linear Classifiers (perceptrons)
A Metric Notion of Dimension and Its Applications to Learning Robert Krauthgamer (Weizmann Institute) Based on joint works with Lee-Ad Gottlieb, James.
Cse 521: design and analysis of algorithms Time & place T, Th pm in CSE 203 People Prof: James Lee TA: Thach Nguyen Book.
A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF.
Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)
1 List Coloring and Euclidean Ramsey Theory TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A Noga Alon, Tel Aviv.
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Efficient classification for metric data Lee-Ad GottliebHebrew U. Aryeh KontorovichBen Gurion U. Robert KrauthgamerWeizmann Institute TexPoint fonts used.
Proximity algorithms for nearly-doubling spaces Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Probably Approximately Correct Model (PAC)
Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual.
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Required Sample size for Bayesian network Structure learning
Matrix sparsification and the sparse null space problem Lee-Ad GottliebWeizmann Institute Tyler NeylonBynomial Inc. TexPoint fonts used in EMF. Read the.
Doubling Dimension in Real-World Graphs Melitta Lorraine Geistdoerfer Andersen.
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
If we measured a distribution P, what is the tree- dependent distribution P t that best approximates P? Search Space: All possible trees Goal: From all.
Efficient Regression in Metric Spaces via Approximate Lipschitz Extension Lee-Ad GottliebAriel University Aryeh KontorovichBen-Gurion University Robert.
Algorithms on negatively curved spaces James R. Lee University of Washington Robert Krauthgamer IBM Research (Almaden) TexPoint fonts used in EMF. Read.
1 By: MOSES CHARIKAR, CHANDRA CHEKURI, TOMAS FEDER, AND RAJEEV MOTWANI Presented By: Sarah Hegab.
Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel Transductive Rademacher Complexity and its Applications.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Fast, precise and dynamic distance queries Yair BartalHebrew U. Lee-Ad GottliebWeizmann → Hebrew U. Liam RodittyBar Ilan Tsvi KopelowitzBar Ilan → Weizmann.
T. Poggio, R. Rifkin, S. Mukherjee, P. Niyogi: General Conditions for Predictivity in Learning Theory Michael Pfeiffer
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Machine Learning Chapter 5. Evaluating Hypotheses
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Support Vector Machines Tao Department of computer science University of Illinois.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
1 Approximation algorithms Algorithms and Networks 2015/2016 Hans L. Bodlaender Johan M. M. van Rooij TexPoint fonts used in EMF. Read the TexPoint manual.
Adaptive Metric Dimensionality Reduction Aryeh KontorovichBen Gurion U. joint work with: Lee-Ad GottliebAriel U. Robert KrauthgamerWeizmann Institute.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Correlation Clustering
Nearly optimal classification for semimetrics
Haim Kaplan and Uri Zwick
Vapnik–Chervonenkis Dimension
Sketching and Embedding are Equivalent for Norms
Computational Learning Theory
Computational Learning Theory
Lecture 15: Least Square Regression Metric Embeddings
Clustering.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
On Solving Linear Systems in Sublinear Time
Presentation transcript:

Efficient classification for metric data Lee-Ad GottliebWeizmann Institute Aryeh KontorovichBen Gurion U. Robert KrauthgamerWeizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A

Efficient classification for metric data 2 Classification problem Probabilistic concept learning S is a set of n examples (x,y) drawn from X x {-1,1} according to some unknown probability distribution P. The learner produces hypothesis h: X {-1,1} A good hypothesis (classifier) minimizes the generalization error P{(x,y): h(x) y} A popular solution uses kernels Data represented as vectors, kernels take the dot-product of vectors

Efficient classification for metric data 3 Finite metric space (X,d) is a metric space if X = set of points d = distance function Nonnegative Symmetric Triangle inequality Classification for metric data? Problem: No vector representation No notion of dot-product Cant use kernels What can be done in this setting? Haifa Jerusalem Tel-Aviv 151km 95km62km

Efficient classification for metric data 4 Preliminary definition The Lipschitz constant L of a function f: X R is the smallest value that satisfies for all points x i,x j in X L |f(x i )-f(x j )| / d(x i,x j ) Consider a hypothesis consistent with all of S Its Lipschitz constant is determined by the closest pair of differently labeled points L 2 / d(x i,x j ) for all x i in S, x j in S +

Efficient classification for metric data 5 Classification for metric data A powerful framework for this problem was introduced by von Luxburg & Bousquet (vLB, JMLR 04) The natural hypotheses (classifiers) to consider are maximally smooth Lipschitz functions Given the classifier h, the problem of evaluating of h for new points in X reduces to the problem of finding a Lipschitz function consistent with h Lipschitz extension problem, a classic problem in Analysis For example f(x) = min i [y i + 2d(x, x i )/d(S +,S )]over all (x i,x j ) in S Function evaluation reduces to exact Nearest Neighbor Search (assuming zero training error) Strong theoretical motivation for the NNS classification heuristic

Efficient classification for metric data 6 Two new directions The framework of vLB leaves open two further questions: Efficient evaluation of the classifier h on X In arbitrary metric space, exact NNS requires Θ(n) time Can we do better? Bias – variance tradeoff Which sample points in S should h ignore? q ~1 +1

Efficient classification for metric data 7 Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x. The doubling constant (of a metric M) is the minimum value ¸ such that every ball can be covered by ¸ balls of half the radius First used by [Ass-83], algorithmically by [Cla-97]. The doubling dimension is dim(M)=log ¸ (M) [GKL-03] A metric is doubling if its doubling dimension is constant Packing property of doubling spaces A set with diameter D and min. inter-point distance a, contains at most (D/a) O(log ¸ ) points Here7.

Efficient classification for metric data 8 Application I We provide generalization bounds for Lipschitz functions on spaces with low doubling dimension vLB provided similar bounds using covering numbers and Rademacher averages Fat-shattering analysis: Lipschitz function shatters a set inter-point distance is at least 2/L Packing property set has (DL) O(log ¸ ) points So the fat-shattering dimension is low

Efficient classification for metric data 9 Application I Theorem: For any f that classifies a sample of size n correctly, we have with probability at least 1 P {(x, y) : sgn(f(x)) y} 2/n (d log(34en/d) log(578n) + log(4/ )). Likewise, if f is correct on all but k examples, we have with probability at least 1 P {(x, y) : sgn(f(x)) y} k/n + [2/n (d ln(34en/d) log2(578n) + ln(4/ ))] 1/2. In both cases, d 8LD] log ¸ +1.

Efficient classification for metric data 10 Application II Evaluation of h for new points in X Lipschitz extension function f(x) = min i [y i + 2d(x, x i )/d(S +,S )] Requires exact nearest neighbor search, which can be expensive! New tool: (1+ )-approximate nearest neighbor search ¸ O(1) log n + ¸ O(-log ) time [KL-04, HM-05, BKL-06, CG-06] If we evaluate f(x) using an approximate NNS, we can show that the result agrees with (the sign of) at least one of g(x) = (1+ ) f(x) + h(x) = (1+ ) f(x) - Note that g(x) f(x) h(x) g(x) and h(x) have Lipschitz constant (1+ )L, so they and the approximate function generalizes well

Efficient classification for metric data 11 Bias variance tradeoff Which sample points in S should h ignore? If f is correct on all but k examples, we have with probability at least 1 P {(x, y):sgn(f(x)) y} k/n+ [2/n (d ln(34en/d)log2(578n) +ln(4/ ))] 1/2. Where d 8LD] ¸

Efficient classification for metric data 12 Bias variance tradeoff Algorithm Fix a target Lipschitz constant L O(n 2 ) possibilities Locate all pairs of points from S + and S - whose distance is less than 2L At least one of these points has to be taken as an error Goal: Remove as few points as possible

Efficient classification for metric data 13 Bias variance tradeoff Algorithm Fix a target Lipschitz constant L Out of O(n 2 ) possibilities Locate all pairs of points from S + and S - whose distance is less than 2L At least one of these points has to be taken as an error Goal: Remove as few points as possible Minimum vertex cover NP-Complete Admits a 2-approximation in O(E) time

Efficient classification for metric data 14 Bias variance tradeoff Algorithm Fix a target Lipschitz constant L Out of O(n 2 ) possibilities Locate all pairs of points from S + and S - whose distance is less than 2L At least one of these points has to be taken as an error Goal: Remove as few points as possible Minimum vertex cover NP-Complete Admits a 2-approximation in O(E) time Minimum vertex cover on a bipartite graph Equivalent to maximum matching (Konigs theorem) Admits an exact solution in O(n ) randomized time

Efficient classification for metric data 15 Bias variance tradeoff Algorithm: For each of O(n 2 ) values of L Run matching algorithm to find minimum error Evaluate generalization bound for this value of L O(n ) randomized time Better algorithm Binary search over O(n 2 ) values of L For each value Run matching algorithm Find minimum error in O(n log n) randomized time Evaluate generalization bound for this value of L Run greedy 2-approximation Approximate minimum error in O(n 2 log n) time Evaluate approximate generalization bound for this value of L

Efficient classification for metric data 16 Conclusion Results: Generalization bounds for Lipschitz classifiers in doubling spaces Efficient evaluation of the Lipschitz extension hypothesis using approximate NNS Efficient calculation of the bias variance tradeoff Continuing research Similar results for continuous labels