On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.

Slides:



Advertisements
Similar presentations
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Advertisements

Vector Spaces A set V is called a vector space over a set K denoted V(K) if is an Abelian group, is a field, and For every element vV and K there exists.
VC Dimension – definition and impossibility result
Great Theoretical Ideas in Computer Science for Some.
Informational Complexity Notion of Reduction for Concept Classes Shai Ben-David Cornell University, and Technion Joint work with Ami Litman Technion.
Inapproximability of Hypergraph Vertex-Cover. A k-uniform hypergraph H= : V – a set of vertices E - a collection of k-element subsets of V Example: k=3.
Let X 1, X 2,..., X n be a set of independent random variables having a common distribution, and let E[ X i ] = . then, with probability 1 Strong law.
Sub Exponential Randomize Algorithm for Linear Programming Paper by: Bernd Gärtner and Emo Welzl Presentation by : Oz Lavee.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Chain Rules for Entropy
The number of edge-disjoint transitive triples in a tournament.
1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista.
By : L. Pour Mohammad Bagher Author : Vladimir N. Vapnik
Sampling Distributions
Finding Small Balanced Separators Author: Uriel Feige Mohammad Mahdian Presented by: Yang Liu.
The Mean Square Error (MSE):. Now, Examples: 1) 2)
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Vapnik-Chervonenkis Dimension
Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.
Vapnik-Chervonenkis Dimension Part II: Lower and Upper bounds.
Visual Recognition Tutorial
1 Engineering Computation Part 5. 2 Some Concepts Previous to Probability RANDOM EXPERIMENT A random experiment or trial can be thought of as any activity.
Probability theory 2011 Convergence concepts in probability theory  Definitions and relations between convergence concepts  Sufficient conditions for.
Experimental Evaluation
On Complexity, Sampling, and є-Nets and є-Samples. Present by: Shay Houri.
Maximum Likelihood Estimation
Efficient Partition Trees Jiri Matousek Presented By Benny Schlesinger Omer Tavori 1.
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
Basics Set systems: (X,F) where F is a collection of subsets of X. e.g. (R 2, set of half-planes) µ: a probability measure on X e.g. area/volume is a.
Statistical Decision Theory
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Independence and Bernoulli.
Week 15 - Wednesday.  What did we talk about last time?  Review first third of course.
Ch. 6 - Approximation via Reweighting Presentation by Eran Kravitz.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Chapter 9: Geometric Selection Theorems 11/01/2013
Monochromatic Boxes in Colored Grids Joshua Cooper, USC Math Steven Fenner, USC CS Semmy Purewal, College of Charleston Math.
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Barnett/Ziegler/Byleen Finite Mathematics 11e1 Learning Objectives for Section 6.3 The student will be able to formulate the dual problem. The student.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Arrangements and Duality Motivation: Ray-Tracing Fall 2001, Lecture 9 Presented by Darius Jazayeri 10/4/01.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
Chapter 5. Continuous Random Variables. Continuous Random Variables Discrete random variables –Random variables whose set of possible values is either.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
5.3 Algorithmic Stability Bounds Summarized by: Sang Kyun Lee.
Presented by Alon Levin
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Extending a displacement A displacement defined by a pair where l is the length of the displacement and  the angle between its direction and the x-axix.
Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.
Theory of Computational Complexity M1 Takao Inoshita Iwama & Ito Lab Graduate School of Informatics, Kyoto University.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
On Complexity, Sampling, and ε-Nets and ε-Samples
Information Complexity Lower Bounds
Markov Chains Mixing Times Lecture 5
Vapnik–Chervonenkis Dimension
Additive Combinatorics and its Applications in Theoretical CS
Depth Estimation via Sampling
Computational Learning Theory
Seminar on Markov Chains and Mixing Times Elad Katz
Computational Learning Theory
Mathematical Foundations of BME
CSCI B609: “Foundations of Data Science”
Quantum Foundations Lecture 3
Quantum Foundations Lecture 2
Compact routing schemes with improved stretch
Clustering.
Locality In Distributed Graph Algorithms
Presentation transcript:

On Complexity, Sampling, and -Nets and -Samples

Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family of subsets of, it’s elements called ranges. For example:

Measure Let be a range space and let x be a finite subset of, then for the measure is:

Motivation For, the estimation of, is defined by: We want to find a Sample, such that for every,.

More Definitions Let be a range space. For, let be the projection of on If contains all subsets of, then is shattered by is the maximum cardinality of a shattered subset of

Half-Spaces Claim: let be a set in. There are real numbers,not all of them zero, such that Theorem: Let be a set in. Then, there exist 2 disjoint subsets such that

Half-Spaces Lemma: let be a finite set. Let and is a half-space of containing. Then there is a point in contained in Corollary: Let Then,. Proof: – regular simplex –

The Growth Function Let be the growth function. Notice:

Sauer’s Lemma If is a range space with and then Proof: By induction. The claim trivially holds for or. Let, and consider the sets:

Sauer’s Lemma Notice: By induction.

Even More Definitions Let be a range space. We define its shatter function to be: We define the shattering dimension as the smallest such that

Let, we have that. As such and by its definition

Let be the largest set shattered by and, so Assuming

Why Do We Need Shattering Dimension? Let be a range space. The shattering dimension of is 3 Proof: Consider any set of points, then : –

Mixing Range Spaces Let be range spaces with finite VC dimensions. Let be a function that maps k-tuples of sets, into a subset of X by Consider the range set Theorem: The associated range space has a VC dimension bounded by, where

Proof Assume is being shattered by Assume

Proof Setting gives us:

Dual Range Spaces Let be a range space. Let, We define The dual range space is

Assume is shattered by, then there are points creating the following matrix:

The set of points represented by the columns of M’ is shattered by

Let be a range space, and let be a finite subset of points. A subset is an for if :

The Theorem There is a positive constant such that if is any range space with VC dimension at most, is a finite subset and then a random subset of cardinality is an for with probability at least

Let be a range space, and let be a finite subset of points. A subset is an for if:

The Theorem Let be a range space of VC dimension, and let be a finite subset of points. Suppose Let be a set obtained by random independent draws from, where: Then is an with probability at least

Some Applications

Range Searching Consider a very large set of points in the plane. We would like to be able to quickly decide how many points are included inside a query rectangle. The Theorem tells us that there is a subset of constant size, that works for all query rectangles.

Learning a Concept A function is defined over the plane and returns ‘1’ inside an unknown disk and ‘0’ outside of it. There is some distribution and we can pick (roughly) random points in a sample and compute just for them

Learning a Concept

We compute the smallest that contains only the points in labeled ‘1’ and define that returns ‘1’ inside and ‘0’ outside it We claim that classifies correctly all but of the points:

Learning a Concept Let be a range space where is all the symmetric differences between 2 disks. is our finite set is the probability of mistake in classification

Learning a Concept By the Theorem is an, and so:

Discrepancy Let be a coloring. We define the discrepancy of over a range as the amount of imbalance in the coloring: Let be a partition of into pairs. We will refer to as compatible with if a each point in a pair in is colored by a different color by

Discrepancy Let denote the crossing number of Let be the contribution of the i’th crossing pair to For, for some threshold we have by Chernoff’s inequality:

Lemma: Let be an for, and let be an for. Then is an for Proof: Building via Discrepancy

Let be a range space with shattering dimension Let be a set of points, consider the induced range space Consider coloring, with the discrepancy bounded as we have seen ant let Building via Discrepancy

Now, for every range :, for some absolute constant Building via Discrepancy

Let. We will compute coloring of with low discrepancy. Let be the points of colored -1 by Let By the lemma we have that is a for Building via Discrepancy

We look for the maximal such that : So, taking the largest results in a set of the size which is an for Building via Discrepancy

Let be a range space of VC dimension and let be a set of points of size Let be a set of points obtained by independent samples from Let We wish to bound Let be a sample generated like Proof of the Theorem

Let Assume occurs, and Then Proof of the Theorem

Let Then is a binomial variable with: Thus, by Chebychev’s inequality: Proof of the Theorem

Let Proof of the Theorem

Now we fix a set,and it’s enough to consider the range space, we know Let us fix any and consider the event Proof of the Theorem

If it is trivial Otherwise, Proof of the Theorem

And to sum up: