Dana Ron Tel-Aviv University

Dana Ron Tel-Aviv University
Fast, cheap, but in control(*): Sublinear-time algorithms for approximate computations Dana Ron Tel-Aviv University (*) ”Fast, cheap, and out of control” is a documentary film by Errol Morris. The film doesn’t really have anything to do with this talk, but I was tempted to make a twist on the title

PART I: Introduction & Property testing
Your data is BIG but is it Blue? (plagiarized from Clemént Cannone)

Efficient Algorithms Usually, when we say that an algorithm is efficient we mean that it runs in time polynomial in the input size n (e.g., size of an input string s1s2…sn, or number of vertices in a graph ). Naturally, we seek as small an exponent as possible, so that O(n2) is good, O(n3/2log3(n)) is better, and linear time O(n) is really great! But what if n is HUGE so that even linear time is prohibitive? Are there tasks we can perform “super-efficiently” in sub-linear time? Supposedly, we need linear time just to read the input without processing it at all. But what if we don’t read the whole input but rather sample from it? s1s2…si…sj…sn

Sublinear Algorithms Very simple problem:
Input: A string s in {0,1}n (represented as array s[]) Output: Fraction of 1’s in s Can compute exactly in linear time O(n) Can approximate whp in sublinear-time by taking sample s[i1],…,s[ik] of size k independent of n: If want fraction of 1’s in sample to be within  of true fraction of 1’s in s, with probability at least 1-, sufficient to take k = (log(1/)/2) (by an additive Chernoff (Hoeffding) bound) I.e., if exact fraction is , and fraction in sample is ’, then Pr[ | ’ -  |   ]  1- s[1]s[2]…s[i2]…s[i1]…s[i3]…s[n]

Sublinear Algorithms Another problem (a decision problem)
Input: An array s[] of distinct numbers Output: Is s sorted (monotone increasing)? We can decide in linear time O(n). Can we decide approximately (w.h.p) in sublinear-time? What does it mean to “decide approximately”? Approximate decision (a.k.a. Property Testing): If object (array) has property (is sorted) then output accept (w.h.p) If object is far from having property, then output reject w.h.p. (h.p: high constant probability) -far: should modify -fraction of object to obtain property (should modify n entries in s so that sorted)

Testing Sortedness (Monotonicity)
Input: An array s[] of distinct numbers Output: If s is sorted, output accept, if s is -far from sorted, output reject whp. Observation: “Natural algorithm” (take uniform sample and check whether sample is sorted) does not work unless sample size = (n1/2). This l.b. holds since in fewer samples won’t “hit” consecutive (“violating”) pair so get sorted sample. 11 10 13 12 15 14 17 16 19 18 21 20 23 22 25 24 27 26 29 28 31 30 33 32 (n1/2) follows from “birthday paradox” – lower bound direction. dist to being sorted is 1/2

Testing Monotonicity Cont’
An alternative algorithm: [Ergün Kannan Kumar Ravi Rubinfeld Vishwanathan] Repeat the following (1/) times: Pick an entry uniformly at random. Let x be the value in that entry. Perform a binary search for x If x is found, accept, otherwise, reject. Complexity (log n/) If sorted, alg accepts w.p. 1 20 21 18 19 16 17 14 15 12 13 10 11 26 27 24 25 22 23 29 28 32 33 30 31 x = 28 Main Claim: entries for which search would succeed define a monotonically increasing sequence (non-consec). Since can modify others to get sorted s, if –far then must have more than –fraction entries on which search fails, causing testing to reject w.h.p.

Property Testing For a fixed property P and any object O,
determine whether O has property P, or whether O is far from having property P (i.e., far from any other object having P ). ? ? ? ? ? Task should be performed by inspecting (querying) the object (in as few places as possible).

Examples The object can be an array and the property monotonicity.
The object can be a function and the property linearity (corresponds to the Hadamard Code) The object can be a graph and the property 3-colorabilty. The object can be a set of points and the property is that they are “well clusterable” The object can be an image and the property is “it’s a cat” To define property testing problem precisely, must specify: object, property, query (sampling) access, and distance measure

Context Property testing can be viewed as:
A relaxation of (exactly) deciding whether the object has the property. A relaxation of learning the object (i.e., obtaining an approximation of the whole object). In either case want testing algorithm to be significantly more efficient than decision/learning algorithm.

When can Property Testing be Useful?
Object is to too large to even entirely scan, so must make approximate decision. Object is not too large but (1) Exact decision is NP-hard (e.g. coloring) (2) Prefer sub-linear approximate algorithm to polynomial exact algorithm. Use Testing as preliminary step to exact decision or learning. In first case can quickly rule out objects far from property. In second case can aid in efficiently selecting good hypothesis class.

Property Testing - Background
Initially defined by Rubinfeld and Sudan in the context of Program Testing (of algebraic functions). Goldreich Goldwasser and Ron initiated study of testing properties of graphs. Growing body of work deals with properties of functions, graphs, strings, sets of points ... Many algorithms with complexity that is sub-linear in (or even independent of) size of object.

Linearity Testing [Blum Luby Rubinfeld]
Def1: A function f : Fn  F is called linear (multi-linear) if there exist coefficients a1,…,an  F s.t. f(x1,…,xn) =  aixi . Def2: A function f is said to be -far from linear if for every linear function g, dist(f,g)>, where dist(f,g)=Pr[f(x)  g(x)] (x selected uniformly in Fn). Def3: Linearity Testing Problem: Algorithm can query function on any x in Fn to obtain f(x) - if f is linear then alg should accept; - if f is -far from linear then alg should reject whp.; Fact: A function f : Fn  F is linear i.f.f for every x,y  Fn it holds that f(x)+f(y)=f(x+y) .

Linearity Testing Cont’
Linearity Testing algorithm 1) Uniformly and independently select (1/) pairs of elements x,y  Fn . 2) For every pair x,y selected, verify that f(x)+f(y) = f(x+y). 3) If for any of the pairs selected linearity is violated (i.e., f(x)+f(y)  f(x+y)), then REJECT, otherwise ACCEPT. Observe: If f is linear then test accepts w.p. 1. Lemma: If f is -far from linear then with probability at least 2/3 the test rejects it. Lemma: If f is accepted with probability greater than 1/3 , then f is -close to linear.

Linearity Testing Cont’
Lemma: If f is accepted with probability greater than 1/3 , then f is -close to linear. Linearity Testing Cont’ Suppose f is accepted w.p > 1/3 small (< /2) fraction of violating pairs (f(x)+f(y)f(x+y)) Define self-corrected version of f, denote g: For each x,y let Vy(x) = f(x+y)-f(y) (the vote of y on x), g(x) = Plurality(Vy(x)) Can show that (conditioned on < /2 fraction of violating pairs): (1) g is linear. (2) dist(f,g)  

More Generally: Testing Polynomials (Testing (Generalized) Reed Muller Codes)
Def: A function f : Fn  F is a (total) degree d polynomial if there exist coefficients {av} where v=v1…vn, vi ≥ 0, vi  d s.t. Different algorithms were designed to deal with different cases (e.g. d=1 [BLR], F=GF(2), d>1 [Alon,Kaufman,Krivelevich,Litsyn,R]), and are analyzed using Self-correction approach. Unifying algorithm [Kaufman,R] works by restricting function to low-dimensional affine subspaces, and checking that restriction is low-deg polynomial. (Further results tightened the analysis [Bhattacharyya, Kopparty,Schoenebeck,Sudan,Zuckerman], [Haramaty, Shpilka,Sudan])

PART II: Sublinear Estimation of Graph Parameters
Your data is BIG but what’s it color (approximately)?

Graph Parameters A Graph Parameter: a function  that is defined on a graph G (undirected / directed, unweighted / weighted). For example: Average degree Number of subgraphs H in G Number of connected components Minimum size of a vertex cover Maximum size of a matching Number of edges that should be added to make graph k-connected (distance to k-connectivity) Minimum weight of a spanning tree

Computing/Approximating Graph Parameters Efficiently
For all parameters described in the previous slide, have efficient, i.e., polynomial-time algorithms for computing the parameter (possibly approximately). For some even linear-time. However, in some cases, when inputs are very large, we might want even more efficient algorithms: sublinear-time algorithms. Such algorithms do not even read the entire input, are randomized, and provide an approximate answer (with high success probability).

Average Degree Let davg = davg(G) denote average degree in G, davg1
Observe: approximating average of general function with range {0,..,n-1} (degrees range) requires (n) queries, so must exploit non-generality of degrees Can obtain (2+)-approximation of davg by performing O(n1/2/) degree queries [Feige]. Going below 2: (n) queries [Feige]. With degree and neighbor queries, can obtain (1+)-approximation by performing Õ(n1/2 poly(1/)) queries [Goldreich,R]. Comment1: In both cases, can replace n1/2 with (n/davg)1/2 Comment2: In both cases, results are tight (in terms of dependence on n/davg).

Average Degree Ingredient 1: Consider partition of all graph vertices into r=O((log n)/) buckets: In bucket Bi vertices v s.t (1+)i-1 < deg(v) ≤ (1+)i ( = /8 ) ) Suppose can obtain for each i estimate bi=|Bi|(1) (1/n)i bi(1+)i = (1)davg (*) How to obtain bi? By sampling (and applying [Chernoff]). Difficulty: if Bi is small (<< n1/2) then necessary sample is too large ((|Bi|/n)-1 >> n1/2). Ingredient 2: ignore small Bi’s. Take sum in (*) only over large buckets (|Bi| > (n)1/2/2r). Claim: (1/n)large i bi(1+)i  davg/(2+) (**)

Average Degree Claim: (1/n)large i bi(1+)i  davg/(2+) (**)
Sum of degrees = 2  num of edges (small: |Bi| ≤ (n)1/2/2r, r : num of buckets) small buckets large buckets not counted counted once counted twice Using (**) get (2+)-approximation with Õ(n1/2/2) degree queries Ingredient 3: Estimate num of edges counted once and compensate for them.

Average Degree Ingredient 3: Estimate num of edges counted once and compensate for them. large buckets small buckets Bi For each large Bi estimate num of edges between Bi and small buckets by sampling neighbors of (random) vertices in Bi. By adding this estimate ei to (**) get (1+)-approx. (1/n)large i bi(1+)i (**) (1/n)large i (bi(1+)i + ei)

Number of other small subgraphs
Approximating avg. degree same as approximating number of edges. What about other subgraphs? (Also known as counting network motifs.) [Gonen,R,Shavitt] considered length-2 paths, and more generally, s-stars. (avg deg + 2-stars gives variance, larger s – higher moments) Give sublinear-time algorithm and show that it is tight. [Eden, Levi, R] considered triangles (counting number of triangles exactly and approximately in streaming model were studied quite extensively) Give sublinear-time algorithm and show that it is tight. Complexity (roughly) n/t1/3 + m2/3/t (n: num of vertices, m: num of edges, t: num of triangles)

Minimum weight Spanning Tree (MST)
Recall: A spanning tree T of a graph G=(V,E) is a subgraph T = (V,E’) that is a tree (i.e., is connected and has no cycles). When the edges have weights, the Minimum weight Spanning tree (MST) of G is a spanning tree with minimum weight Can find an MST in linear time, but what if just want to estimate the weight of an MST?

MST [Chazelle,Rubinfeld,Trevisan] give (1+)-approximation alg using Õ(dW/2) neighbor queries where d is degree bound and weights in {1,..,W}. Result is tight and extends to d=davg and weights in [1,W]. Suppose first: W=2 (i.e., weights either 1 or 2) E1 = edges with weight 1, G1=(V,E1), c1 = num of connected components in G1. Weight of MST: 2(c1-1) + 1(n-1-(c1-1)) = n-2+c1 Estimate MST weight by estimating c1

MST More generally (weights in {1,..,W}) Ei = edges with weight ≤ i, Gi=(V,Ei), ci = num of connected components (cc’s) in Gi. Weight of MST: n - W + i=1..W-1 ci Estimate MST weight by estimating c1,…,cW-1. Idea for estimating num of cc’s in graph H (c(H)): For vertex v, nv = num of vertices in cc of v. Then: c(H) = v(1/nv) 3(1/3) 4(1/4) 2(1/2)

MST c(H) = v(1/nv) (nv = num of vertices in cc of v)
Can estimate c(H) by sampling vertices v and finding nv for each (using BFS). Difficulty: if nv is large, then “expensive” Let S = {v : nv ≤ 1/} vS(1/nv)  c(H) – n/(1/) = c(H) - n Alg for estimating c(H) selects (1/2) vertices, runs BFS on each selected v until finds nv or determines that nv > 1/ (i.e. v S). Uses (1/nv) for sampled vertices in S to estimate c(H). Complexity: O(d/3) Alg for estimating MST can run above alg on each Gi with =/(2W) (so that when sum estimates of ci over i=1,…,W get desired approximation).

Summary Polynomial-time algorithms are good, linear-time algorithms are better, and sublinear-time algorithms are the best! Property-testing: type of sublinear-time approximate decision. Saw algorithm for testing sortedness (monotonicity) and for testing linearity (mentioned that have algorithm for polynomials) Sublinear approximation of graph parameters. Saw algorithm for estimating average degree/number of edges (mentioned that have algorithm for other small subgraphs), and for estimating min weight of MST.

Thanks

Dana Ron Tel-Aviv University

Similar presentations

Presentation on theme: "Dana Ron Tel-Aviv University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dana Ron Tel-Aviv University

Similar presentations

Presentation on theme: "Dana Ron Tel-Aviv University"— Presentation transcript:

Similar presentations

About project

Feedback