Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 18: Uniformity Testing Monotonicity Testing

Similar presentations


Presentation on theme: "Lecture 18: Uniformity Testing Monotonicity Testing"β€” Presentation transcript:

1 Lecture 18: Uniformity Testing Monotonicity Testing
COMS E F15 Lecture 18: Uniformity Testing Monotonicity Testing Left to the title, a presenter can insert his/her own image pertinent to the presentation.

2 Administrivia, Plan Admin: Plan: Scriber? PS3: pick up
Project proposals: Nov 16th Plan: Uniformity Testing Monotonicity Testing Scriber?

3 Uniformity testing Enough to distinguish:
Algorithm UNIFORM: Input: 𝑛, π‘š, π‘₯ 1 ,… π‘₯ π‘š 𝐢 = 0; for(i=0; i<m; i++) for(j=i+1; j<m; j++) if ( π‘₯ 𝑖 = π‘₯ 𝑗 ) 𝐢++; if (𝐢<π‘Žβ‹… π‘š 2 /𝑛) return β€œUniform”; else return β€œNot uniform”; // π‘Ž: constant dependent on ο₯ Enough to distinguish: ||𝐷| | 2 2 =1/𝑛 (unif) ||𝐷| | 2 2 >1/𝑛+ πœ– 2 /𝑛 (non-unif) Lemma: 1 𝑀 ⋅𝐢 is a good enough as long as π‘š=Ξ© 𝑛 πœ– 4 where 𝑀=π‘š(π‘šβˆ’1)/2 Let 𝑑=||𝐷| | 2 2 Claim 1: 𝐸 𝐢 𝑀 =𝑑 Claim 2: π‘‰π‘Žπ‘Ÿ 𝐢 𝑀 ≀ 𝑑 𝑀 + 8 𝑑 2 𝑛 π‘š Finish lemma proof…

4 Identity Testing Problem: Classic πœ’ 2 test [Pearson 1900]:
We have known distribution 𝑝 Given samples from π‘ž, distinguish between: 𝑝=π‘ž vs ||π‘βˆ’π‘ž| | 1 β‰₯πœ– Uniformity is an instance (𝑝= π‘ˆ 𝑛 ) Classic πœ’ 2 test [Pearson 1900]: Let 𝑋 𝑖 = # of occurrences of 𝑖 𝑖 𝑋 𝑖 βˆ’π‘˜ 𝑝 𝑖 2 βˆ’π‘˜ 𝑝 𝑖 𝑝 𝑖 β‰₯𝛼 Test of [Valiant, Valiant 2014]: 𝑖 𝑋 𝑖 βˆ’π‘˜ 𝑝 𝑖 2 βˆ’ 𝑋 𝑖 𝑝 𝑖 2/3 β‰₯𝛼

5 Distribution Testing++
Other properties? Equality testing: Given samples from unknown 𝑝,π‘ž, distinguish 𝑝=π‘ž vs ||π‘βˆ’π‘ž| | 1 β‰₯πœ– Sample bound: Θ πœ– ( 𝑛 2/3 ) Independence testing: Given samples from (𝑝,π‘ž)∈ 𝑛 Γ—[𝑛], distinguish: 𝑝 is independent of π‘ž vs || 𝑝,π‘ž βˆ’π΄Γ—π΅| | 1 β‰₯πœ– for any distributions 𝐴,𝐡 on [𝑛] Sample bound: Θ πœ– (𝑛) Many more…

6 Testing Monotonicity Problem: given a sequence π‘₯ 1 ,… π‘₯ 𝑛 , distinguish: sequence is sorted, vs sequence is NOT sorted In π‘œ(𝑛) time? Hard exactly: can have just one inversion somewhere Approximation notion: πœ–-far All sequences of length 𝑛 all sorted sequences πœ–-far: if need to delete at least πœ– fraction of elements to make it sorted

7 Testing Monotonicity A testing algorithm: How many samples? Fix?
πœ–-far: if need to delete at least πœ– fraction of elements to make it sorted A testing algorithm: Sample random positions 𝑖 Check that π‘₯ 𝑖 ≀ π‘₯ 𝑗 iff 𝑖<𝑗 How many samples? Bad case: 2,1,4,3,…,i,i-1,i+2,i+1,…,n,n-1 At least Ξ© 𝑛 before we see an adjacent pair Fix? Can sample adjacent pairs! Works? Bad case too

8 Algorithm: Monotonicity
Assumption: π‘₯ 𝑖 β‰  π‘₯ 𝑗 One iteration: Pick a random 𝑖 Do binary search on π‘₯= π‘₯ 𝑖 in the sequence Start with interval [s,t]=[1,n] For interval [s,t], find middle π‘š= 𝑠+𝑑 2 If π‘₯< π‘₯ π‘š , recurse on the left If π‘₯> π‘₯ π‘š , recurse on the right Fail if find inconsistency: 1) π‘₯ 𝑖 not found where it should be 2) π‘₯ π‘š βˆ‰[ π‘₯ 𝑠 , π‘₯ 𝑑 ] Algorithm MONOTONICITY: Input: 𝑛, π‘₯ 1 ,… π‘₯ 𝑛 for(r=0; π‘Ÿ<3/πœ–; r++) Let π‘₯= π‘₯ 𝑖 perform binary search on π‘₯ if (π‘₯ not found at position 𝑖 OR binary search inconsistent) return β€œnot sorted”; If finished ok, return β€œsorted”.

9 Analysis: Monotonicity
If sorted, will pass the test If πœ–-far from sorted… How do we argue? Via contrapositive Lemma: suppose one iteration succeeds with probability β‰₯1βˆ’πœ– Then, sequence β‰€πœ– far from a sorted sequence Hence, 3/πœ– repetitions are enough to catch the case of >πœ– far from sortedness with probability 90%: Prob to report β€œsorted” when it’s far: is at most 1βˆ’πœ– 3/πœ– ≀ 𝑒 βˆ’3 ≀0.1 Algorithm MONOTONICITY: Input: 𝑛, π‘₯ 1 ,… π‘₯ 𝑛 for(r=0; π‘Ÿ<3/πœ–; r++) Let π‘₯= π‘₯ 𝑖 perform binary search on π‘₯ if (π‘₯ not found at position 𝑖 OR binary search inconsistent) return β€œnot sorted”; If finished ok, return β€œsorted”.

10 Analysis: Monotonicity
Lemma: suppose one iteration succeeds with probability β‰₯1βˆ’πœ– Then, sequence β‰€πœ– far from a sorted sequence Proof: Call π‘–βˆˆ[𝑛] good if it passes the test Claim: if 𝑖<𝑗 are good, then π‘₯ 𝑖 < π‘₯ 𝑗 Consider the binary search tree, and their lowest common ancestor π‘₯ π‘š It must be: π‘₯ 𝑖 < π‘₯ π‘š and π‘₯ π‘š < π‘₯ 𝑗 Hence: good 𝑖’s are sorted! β€œprobability β‰₯1βˆ’πœ–β€ β‡’ number of good elements is at least 1βˆ’πœ– 𝑛 End of proof! Algorithm MONOTONICITY: Input: 𝑛, π‘₯ 1 ,… π‘₯ 𝑛 for(r=0; π‘Ÿ<3/πœ–; r++) Let π‘₯= π‘₯ 𝑖 perform binary search on π‘₯ if (π‘₯ not found at position 𝑖 OR binary search inconsistent) return β€œnot sorted”; If finished ok, return β€œsorted”.

11 Monotonicity: discussion
Assumption? Replace all π‘₯ 𝑖 by ( π‘₯ 𝑖 ,𝑖) Then sequence must be strictly monotonic Test is adaptive: Where we query depends on what we learned from the previous queries Do we need adaptivity? No! A each iteration, we query for π‘₯= π‘₯ 𝑖 We know precisely where binary search is supposed to look at! E.g., if 𝑖=1, then it’s positions: 𝑛 2 , 𝑛 4 , 𝑛 8 ,… Can generate all the positions to query at the beginning and query them all at the same time Unless, binary search is inconsistent, in which case we detect this from the queries positions Algorithm MONOTONICITY: Input: 𝑛, π‘₯ 1 ,… π‘₯ 𝑛 for(r=0; π‘Ÿ<3/πœ–; r++) Let π‘₯= π‘₯ 𝑖 perform binary search on π‘₯ if (π‘₯ not found at position 𝑖 OR binary search inconsistent) return β€œnot sorted”; If finished ok, return β€œsorted”.

12 Monotonicity++ 𝑂(log 𝑛) queries tight?
Yes Can consider the more general case: Function 𝑓: 0,1 𝑑 β†’{0,1} Monotone: if 𝑓 π‘₯ ≀𝑓(𝑦) whenever π‘₯≀𝑦 (coordinate-wise) Can test in 𝑂 πœ– 𝑑 queries! [GGLRS’98, KMS’15]

13 Testing graphs We have a graph 𝐺= 𝑉,𝐸 Dense case: Sparse case:
𝑛 vertices π‘š edges Dense case: π‘š=Θ 𝑛 2 Sparse case: Degree 𝑑≀𝑂(1) Property testing: Eg, is graph 𝐺 connected? Approximation? πœ–-far: if we need to delete/insert β‰₯πœ–π‘š edges

14 Connectivity in sparse graph
Approximation? πœ–-far: if we need to delete/insert β‰₯πœ–π‘š edges π‘š=𝑑𝑛=𝑂(𝑛) When does it make sense? πœ–π‘‘β‰ͺ1 (otherwise any sparse graph is close to being connected!) Assume: πœ–π‘‘β‰ͺ1 Algorithm: For π‘Ÿ=𝑂 1 πœ–π‘‘ times repeat: Choose a random node 𝑠 Run a BSF from 𝑠 Until see more than 4/πœ–π‘‘ node in the CC If the CC is smaller, then report β€œdisconnected” Otherwise, report β€œconnected”

15 Analysis Claim: if πœ–-far, then graph has at most Ξ©(πœ–π‘‘π‘›) connected components Proof: Suppose 𝐺 has 𝑐 connected component Will connect, using 𝑂(𝑐) modifications Idea: Just connect each connected component consecutively Issue: can get higher degree than 𝑑 in a CC Is really an issue when all nodes in a CC have full degree Just delete one edge (preserving connectivity) Hence, on average a CC has 𝑂 𝑛 πœ–π‘‘π‘› =𝑂 1 πœ–π‘‘ nodes Will pick one of them with probability at least πœ–π‘‘


Download ppt "Lecture 18: Uniformity Testing Monotonicity Testing"

Similar presentations


Ads by Google