On Sample Based Testers

On Sample Based Testers
Oded Goldreich, Weizmann Institute Dana Ron, Tel Aviv University

(Standard) Testing – A quick reminder
Let OBJ be an object (function) of size N a Testing Algorithm for a (prespecified) property P is given a proximity parameter (0,1]; - If OBJ has P should accept with prob  2/3; - If OBJ is -far from having P should reject with prob  2/3. Distance is normalized Hamming To this end the algorithm is given query access to OBJ x1 f(x1) Query complexity q(N,) should be sublinear in N. x2 OBJ=f f(x2) ... This model is as defined by Rubinfeld and Sudan, and most results in property testing are in this model.

Sample-Based Testing f
What if we don’t have query access to the object, but can only obtain (uniform) random samples? Namely, if the tested object is a function f : [N]  R, and we can only obtain pairs (x,f(x)), where x is uniformly distributed in [N]. (x1,f(x1)) (x2,f(x2)) f ...

Sample-Based Testing Background
Setting of sample-based testing is similar to the setting in Learning Theory (learning under the uniform distribution), and was defined in “Property testing and its connection to learning and approximation” [Goldreich, Goldwasser, R]. However, in [GGR] and most works since, results on sample-based testing were mainly negative, essentially establishing the necessity of queries. Two exceptions: Decision Trees over [0,1]d [Kearns, R], Interval Functions (d=1) and Linear Threshold Functions under Gaussian distribution [Balkan, Blais, Blum, Yang] (as part of their study of Active Testing) 1 1

Sample-Based Testing Background
Note: Many works on testing and estimating properties of distributions (starting with [Batu, Fortnow, Rubinfeld, Smith, White]) where the object is a distribution D, the algorithm gets sample distributed according to D, and should test if D has property P. x1 x2 D ... Differs from sample-based testing where: (1) Object is function; (2) Underlying distribution is fixed (uniform); (3) Get function labels; (4) Tested property is of function.

This Work We were interested in understanding the relation between sample-based testing and other models of testing, as well as variants of sample-based testing (as just defined).

Our Results: 1. Relation to POTs
Proximity Oblivious Testers (POT) that are “fair” imply sublinear sample-based testers. (q,)-POT: Performs (const) q queries; - If fP, accepts with prob  c; - If fP, accepts with prob  c- (P(f)) Fair: if each query (almost) uniformly distributed. (q,)-POT that is fair Sample-based tester with sample complexity O(N1-1/q/(())2+3/q). Comments: (1) Stronger notion of fairness (e.g., pairs of queries unif. dist.) gives better sample complexity; (2) Some notion of fairness is necessary in general (3) Fairness not necessary for Boolean functions, q=2 and POT with 1-sided error (c=1) [Fischer, Goldhirsh, Lachish]. dist of f to P Example, Linearity: query on x1,x2 and x1+x2

Our Results: 2. Dense-Graphs Model
Quasi-Canonical testers in dense-graphs model imply sublinear sample-based testers. Dense-Graphs model: G=(V,E) represented by adjacency matrix: can query if (u,v) E and N = n2 (n=|V|). Quasi-Canonical: Select v(N,) vertices uniformly at random; Query all pairs; Decide based on induced subgraph, possibly randomly. Quasi-Canonical tester (using v(N,) vertices) Sample-based tester with sample complexity O(N1-1/(v(N,)-1) ). Comment: Better results for graph partitioning problems: e.g., for k-colorability O(N1/2log k/) Not time-efficient, but (for k-colorability) can get time N1-(1/2k)g(k,) (with N1-1/2k(k/2) samples). v u 1 (N1/2 necessary even for k=2)

Our Results: 3. Distribution-Free Testing
Distribution-free sample-based testing is related to 1-sided error sample-based testing (under unif. dist.) non-trivially. Distribution-free (sample-based)* testing: Unknown distribution D, algorithm get samples (x,f(x)) where xD, and distance (to property) is defined w.r.t. D. For property P, DF(P): distribution-free sample complexity, OSE(P): 1-sided error sample complexity, consider const . Every P: OSE(P) = Õ(DF(P)2). Exist P s.t. OSE(P) = (DF(P)). Exist P s.t. OSE(P) = (DF(P)). Exist P s.t. OSE(P) = o(DF(P)). (x1D,f(x1)) f (x2D,f(x2)) P is natural (E.g., OSE(P) = O(log(DF(P))), or even OSE(P) = O(1) when DF(P)= (N).) (*) Distribution-free testing previously studied when queries are also allowed (e.g., by Halevi and Kushilevitz).

Our Results: 4. Testing Distributions
Distribution testing reduces to sample-based testing of symmetric properties (articulates [Sudan]). Distribution testing: For unknown distribution D, algorithm gets samples xD and should decide whether D has property P or is -far from having P, where distance is L1 (statistical/variation). Symmetric properties of functions f: X  R: Property is invariant under permutations over X (in other words, property defined by {Ny}yR where Ny=|{xX : f(x)=y}| ). Let P= U Pm be property of distributions s.t. Pm has support that is a subset of Sm, |Sm|=m. Denote sample complexity of testing P by s(m,). Exists symmetric property P’=U PN,m of functions, with domain [N] and range Sm such that for every N  cn2/4 s(m,) = O(s’(N,m,/2)), s(m,) = (s’(N,m,2)).

Proof Sketch of OSE(P) = Õ(DF(P)2)
Recall: DF(P) is distribution-free sample complexity, OSE(P) is 1-sided error (unif. dist.) sample complexity. Let T’ be a distribution-free sample-based algorithm with complexity s = o(N1/2) that errs with prob < 1/6. (The case s = (N1/2) is trivial.) Define T to be the algorithm that takes a uniform sample of size r=O(s2), labeled by f: ((x1,f(x1)),…,(xr,f(xr)), xi  U and accepts i.f.f  g  P s.t. g(xi) = f(xi) for every 1  i  r. - By definition of T, if f  P, then T always accepts (as required from 1-sided error testers). - Remains to show that if f is -far from P, then T rejects with probability at least 2/3.

Proof Sketch of OSE(P) = Õ(DF(P)2), cont’
Assume in contradiction: Exists f that is -far from P, which T accepts with probability > 1/3 (recall: T uses uniform sample) That is, for more than 1/3 of sample sets x = {x1,…,xr},  gx  P s.t. gx(xi) = f(xi) For each such “bad” sample set x, consider distribution x that is uniform over x={x1,…,xr}. Key: Given sample of size s distributed according to x, labeled by f, T’ must accept w.p.  5/6. Implies: If select x = {x1,…,xr} uniformly, and give T’ subsample of size s, must accept w.p.  (1/3)(5/6) > 1/6 + 1/10. But, this dist over sample of size s is 0.1-close to uniform, on which T’ must accept w.p.  1/6. X Recall: T’ is dist-free contradiction

Wrapping up We study relations between sample-based testing and other models of testing, as well as variants of sample-based testing. In particular, showed: Proximity Oblivious Testers (POT) that are “fair” imply sublinear sample based testers. Quasi-Canonical testers in dense-graphs model imply sublinear sample-based testers. Distribution-free sample-based testing is related to 1-sided error sample-based testing (under uniform dist) non-trivially (always have quadratic upper bound but other than that, varies). Distribution testing reduces to sample-based testing of symmetric properties (articulates [Sudan]).

Thanks

On Sample Based Testers

Similar presentations

Presentation on theme: "On Sample Based Testers"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

On Sample Based Testers

Similar presentations

Presentation on theme: "On Sample Based Testers"— Presentation transcript:

Similar presentations

About project

Feedback