Lecture 18: Uniformity Testing Monotonicity Testing

Slides:

Advertisements

Similar presentations

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Advertisements

Routing Complexity of Faulty Networks Omer Angel Itai Benjamini Eran Ofek Udi Wieder The Weizmann Institute of Science.

Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.

Great Theoretical Ideas in Computer Science

Two Segments Intersect?

Inapproximability of MAX-CUT Khot,Kindler,Mossel and O ’ Donnell Moshe Ben Nehemia June 05.

WSPD Applications.

Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Walks, Paths and Circuits Walks, Paths and Circuits Sanjay Jain, Lecturer, School of Computing.

QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.

Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.

Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.

CSL758 Instructors: Naveen Garg Kavitha Telikepalli Scribe: Manish Singh Vaibhav Rastogi February 7 & 11, 2008.

Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005

Karger’s Min-Cut Algorithm Amihood Amir Bar-Ilan University, 2009.

Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual.

Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.

CS420 lecture eight Greedy Algorithms. Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number.

Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)

Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)

Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.

Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.

1 Sublinear Algorithms Lecture 1 Sofya Raskhodnikova Penn State University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.

Computational Geometry Piyush Kumar (Lecture 10: Point Location) Welcome to CIS5930.

Preview  Graph  Tree Binary Tree Binary Search Tree Binary Search Tree Property Binary Search Tree functions  In-order walk  Pre-order walk  Post-order.

Testing the independence number of hypergraphs

Topics in Algorithms 2007 Ramesh Hariharan. Tree Embeddings.

Seminar on Sub-linear algorithms Prof. Ronitt Rubinfeld.

Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.

Sub-linear Time Algorithms Ronitt Rubinfeld Tel Aviv University Fall 2015.

1 Distributed Vertex Coloring. 2 Vertex Coloring: each vertex is assigned a color.

Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

Lower Bounds & Sorting in Linear Time

Algorithms for Big Data: Streaming and Sublinear Time Algorithms

Property Testing (a.k.a. Sublinear Algorithms )

Randomized Min-Cut Algorithm

Stochastic Streams: Sample Complexity vs. Space Complexity

Dana Ron Tel Aviv University

Chapter 5. Greedy Algorithms

On Testing Dynamic Environments

Approximating the MST Weight in Sublinear Time

Lecture 22: Linearity Testing Sparse Fourier Transform

From dense to sparse and back again: On testing graph properties (and some properties of Oded)

Great Theoretical Ideas in Computer Science

Orthogonal Range Searching and Kd-Trees

MST in Log-Star Rounds of Congested Clique

COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.

Lecture 10: Sketching S3: Nearest Neighbor Search

Selection in heaps and row-sorted matrices

Lecture 4: CountSketch High Frequencies

Discrete Mathematics for Computer Science

CSE 421: Introduction to Algorithms

Lecture 7: Dynamic sampling Dimension Reduction

Enumerating Distances Using Spanners of Bounded Degree

Lecture 16: Earth-Mover Distance

CIS 700: “algorithms for Big Data”

Haim Kaplan and Uri Zwick

CSCI B609: “Foundations of Data Science”

On the effect of randomness on planted 3-coloring models

Lower Bounds & Sorting in Linear Time

CSE 421: Introduction to Algorithms

Graphs and Algorithms (2MMD30)

Lecture 6: Counting triangles Dynamic graphs & sampling

Compact routing schemes with improved stretch

Lecture 15: Least Square Regression Metric Embeddings

Switching Lemmas and Proof Complexity

Mathematical Induction II

Presentation transcript:

Lecture 18: Uniformity Testing Monotonicity Testing COMS E6998-9 F15 Lecture 18: Uniformity Testing Monotonicity Testing Left to the title, a presenter can insert his/her own image pertinent to the presentation.

Administrivia, Plan Admin: Plan: Scriber? PS3: pick up Project proposals: Nov 16th Plan: Uniformity Testing Monotonicity Testing Scriber?

Uniformity testing Enough to distinguish: Algorithm UNIFORM: Input: 𝑛, 𝑚, 𝑥 1 ,… 𝑥 𝑚 𝐶 = 0; for(i=0; i<m; i++) for(j=i+1; j<m; j++) if ( 𝑥 𝑖 = 𝑥 𝑗 ) 𝐶++; if (𝐶<𝑎⋅ 𝑚 2 /𝑛) return “Uniform”; else return “Not uniform”; // 𝑎: constant dependent on  Enough to distinguish: ||𝐷| | 2 2 =1/𝑛 (unif) ||𝐷| | 2 2 >1/𝑛+ 𝜖 2 /𝑛 (non-unif) Lemma: 1 𝑀 ⋅𝐶 is a good enough as long as 𝑚=Ω 𝑛 𝜖 4 where 𝑀=𝑚(𝑚−1)/2 Let 𝑑=||𝐷| | 2 2 Claim 1: 𝐸 𝐶 𝑀 =𝑑 Claim 2: 𝑉𝑎𝑟 𝐶 𝑀 ≤ 𝑑 𝑀 + 8 𝑑 2 𝑛 𝑚 Finish lemma proof…

Identity Testing Problem: Classic 𝜒 2 test [Pearson 1900]: We have known distribution 𝑝 Given samples from 𝑞, distinguish between: 𝑝=𝑞 vs ||𝑝−𝑞| | 1 ≥𝜖 Uniformity is an instance (𝑝= 𝑈 𝑛 ) Classic 𝜒 2 test [Pearson 1900]: Let 𝑋 𝑖 = # of occurrences of 𝑖 𝑖 𝑋 𝑖 −𝑘 𝑝 𝑖 2 −𝑘 𝑝 𝑖 𝑝 𝑖 ≥𝛼 Test of [Valiant, Valiant 2014]: 𝑖 𝑋 𝑖 −𝑘 𝑝 𝑖 2 − 𝑋 𝑖 𝑝 𝑖 2/3 ≥𝛼

Distribution Testing++ Other properties? Equality testing: Given samples from unknown 𝑝,𝑞, distinguish 𝑝=𝑞 vs ||𝑝−𝑞| | 1 ≥𝜖 Sample bound: Θ 𝜖 ( 𝑛 2/3 ) Independence testing: Given samples from (𝑝,𝑞)∈ 𝑛 ×[𝑛], distinguish: 𝑝 is independent of 𝑞 vs || 𝑝,𝑞 −𝐴×𝐵| | 1 ≥𝜖 for any distributions 𝐴,𝐵 on [𝑛] Sample bound: Θ 𝜖 (𝑛) Many more…

Testing Monotonicity Problem: given a sequence 𝑥 1 ,… 𝑥 𝑛 , distinguish: sequence is sorted, vs sequence is NOT sorted In 𝑜(𝑛) time? Hard exactly: can have just one inversion somewhere Approximation notion: 𝜖-far All sequences of length 𝑛 all sorted sequences 𝜖-far: if need to delete at least 𝜖 fraction of elements to make it sorted

Testing Monotonicity A testing algorithm: How many samples? Fix? 𝜖-far: if need to delete at least 𝜖 fraction of elements to make it sorted A testing algorithm: Sample random positions 𝑖 Check that 𝑥 𝑖 ≤ 𝑥 𝑗 iff 𝑖<𝑗 How many samples? Bad case: 2,1,4,3,…,i,i-1,i+2,i+1,…,n,n-1 At least Ω 𝑛 before we see an adjacent pair Fix? Can sample adjacent pairs! Works? Bad case too

Algorithm: Monotonicity Assumption: 𝑥 𝑖 ≠ 𝑥 𝑗 One iteration: Pick a random 𝑖 Do binary search on 𝑥= 𝑥 𝑖 in the sequence Start with interval [s,t]=[1,n] For interval [s,t], find middle 𝑚= 𝑠+𝑡 2 If 𝑥< 𝑥 𝑚 , recurse on the left If 𝑥> 𝑥 𝑚 , recurse on the right Fail if find inconsistency: 1) 𝑥 𝑖 not found where it should be 2) 𝑥 𝑚 ∉[ 𝑥 𝑠 , 𝑥 𝑡 ] Algorithm MONOTONICITY: Input: 𝑛, 𝑥 1 ,… 𝑥 𝑛 for(r=0; 𝑟<3/𝜖; r++) Let 𝑥= 𝑥 𝑖 perform binary search on 𝑥 if (𝑥 not found at position 𝑖 OR binary search inconsistent) return “not sorted”; If finished ok, return “sorted”.

Analysis: Monotonicity If sorted, will pass the test If 𝜖-far from sorted… How do we argue? Via contrapositive Lemma: suppose one iteration succeeds with probability ≥1−𝜖 Then, sequence ≤𝜖 far from a sorted sequence Hence, 3/𝜖 repetitions are enough to catch the case of >𝜖 far from sortedness with probability 90%: Prob to report “sorted” when it’s far: is at most 1−𝜖 3/𝜖 ≤ 𝑒 −3 ≤0.1 Algorithm MONOTONICITY: Input: 𝑛, 𝑥 1 ,… 𝑥 𝑛 for(r=0; 𝑟<3/𝜖; r++) Let 𝑥= 𝑥 𝑖 perform binary search on 𝑥 if (𝑥 not found at position 𝑖 OR binary search inconsistent) return “not sorted”; If finished ok, return “sorted”.

Analysis: Monotonicity Lemma: suppose one iteration succeeds with probability ≥1−𝜖 Then, sequence ≤𝜖 far from a sorted sequence Proof: Call 𝑖∈[𝑛] good if it passes the test Claim: if 𝑖<𝑗 are good, then 𝑥 𝑖 < 𝑥 𝑗 Consider the binary search tree, and their lowest common ancestor 𝑥 𝑚 It must be: 𝑥 𝑖 < 𝑥 𝑚 and 𝑥 𝑚 < 𝑥 𝑗 Hence: good 𝑖’s are sorted! “probability ≥1−𝜖” ⇒ number of good elements is at least 1−𝜖 𝑛 End of proof! Algorithm MONOTONICITY: Input: 𝑛, 𝑥 1 ,… 𝑥 𝑛 for(r=0; 𝑟<3/𝜖; r++) Let 𝑥= 𝑥 𝑖 perform binary search on 𝑥 if (𝑥 not found at position 𝑖 OR binary search inconsistent) return “not sorted”; If finished ok, return “sorted”.

Monotonicity: discussion Assumption? Replace all 𝑥 𝑖 by ( 𝑥 𝑖 ,𝑖) Then sequence must be strictly monotonic Test is adaptive: Where we query depends on what we learned from the previous queries Do we need adaptivity? No! A each iteration, we query for 𝑥= 𝑥 𝑖 We know precisely where binary search is supposed to look at! E.g., if 𝑖=1, then it’s positions: 𝑛 2 , 𝑛 4 , 𝑛 8 ,… Can generate all the positions to query at the beginning and query them all at the same time Unless, binary search is inconsistent, in which case we detect this from the queries positions Algorithm MONOTONICITY: Input: 𝑛, 𝑥 1 ,… 𝑥 𝑛 for(r=0; 𝑟<3/𝜖; r++) Let 𝑥= 𝑥 𝑖 perform binary search on 𝑥 if (𝑥 not found at position 𝑖 OR binary search inconsistent) return “not sorted”; If finished ok, return “sorted”.

Monotonicity++ 𝑂(log 𝑛) queries tight? Yes Can consider the more general case: Function 𝑓: 0,1 𝑑 →{0,1} Monotone: if 𝑓 𝑥 ≤𝑓(𝑦) whenever 𝑥≤𝑦 (coordinate-wise) Can test in 𝑂 𝜖 𝑑 queries! [GGLRS’98, KMS’15]

Testing graphs We have a graph 𝐺= 𝑉,𝐸 Dense case: Sparse case: 𝑛 vertices 𝑚 edges Dense case: 𝑚=Θ 𝑛 2 Sparse case: Degree 𝑑≤𝑂(1) Property testing: Eg, is graph 𝐺 connected? Approximation? 𝜖-far: if we need to delete/insert ≥𝜖𝑚 edges

Connectivity in sparse graph Approximation? 𝜖-far: if we need to delete/insert ≥𝜖𝑚 edges 𝑚=𝑑𝑛=𝑂(𝑛) When does it make sense? 𝜖𝑑≪1 (otherwise any sparse graph is close to being connected!) Assume: 𝜖𝑑≪1 Algorithm: For 𝑟=𝑂 1 𝜖𝑑 times repeat: Choose a random node 𝑠 Run a BSF from 𝑠 Until see more than 4/𝜖𝑑 node in the CC If the CC is smaller, then report “disconnected” Otherwise, report “connected”

Analysis Claim: if 𝜖-far, then graph has at most Ω(𝜖𝑑𝑛) connected components Proof: Suppose 𝐺 has 𝑐 connected component Will connect, using 𝑂(𝑐) modifications Idea: Just connect each connected component consecutively Issue: can get higher degree than 𝑑 in a CC Is really an issue when all nodes in a CC have full degree Just delete one edge (preserving connectivity) Hence, on average a CC has 𝑂 𝑛 𝜖𝑑𝑛 =𝑂 1 𝜖𝑑 nodes Will pick one of them with probability at least 𝜖𝑑