1 Sublinear Algorithms Lecture 1 Sofya Raskhodnikova Penn State University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.

Slides:



Advertisements
Similar presentations
1 Transitive-Closure Spanners Sofya Raskhodnikova Penn State University Joint work with Arnab Bhattacharyya MIT Elena Grigorescu MIT Kyomin Jung MIT David.
Advertisements

Grigory Yaroslavtsev Joint work with Piotr Berman and Sofya Raskhodnikova.
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Lower Bound for Sparse Euclidean Spanners Presented by- Deepak Kumar Gupta(Y6154), Nandan Kumar Dubey(Y6279), Vishal Agrawal(Y6541)
Deterministic vs. Non-Deterministic Graph Property Testing Asaf Shapira Tel-Aviv University Joint work with Lior Gishboliner.
Property testing of Tree Regular Languages Frédéric Magniez, LRI, CNRS Michel de Rougemont, LRI, University Paris II.
C&O 355 Lecture 23 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Probabilistically Checkable Proofs Madhu Sudan MIT CSAIL 09/23/20091Probabilistic Checking of Proofs TexPoint fonts used in EMF. Read the TexPoint manual.
Artur Czumaj Dept of Computer Science & DIMAP University of Warwick Testing Expansion in Bounded Degree Graphs Joint work with Christian Sohler.
1 Last lecture  Configuration Space Free-Space and C-Space Obstacles Minkowski Sums.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Complexity 18-1 Complexity Andrei Bulatov Probabilistic Algorithms.
The main idea of the article is to prove that there exist a tester of monotonicity with query and time complexity.
CSE332: Data Abstractions Lecture 27: A Few Words on NP Dan Grossman Spring 2010.
Hardness Results for Problems P: Class of “easy to solve” problems Absolute hardness results Relative hardness results –Reduction technique.
Testing the Diameter of Graphs Michal Parnas Dana Ron.
Proximity algorithms for nearly-doubling spaces Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Polynomial Time Approximation Scheme for Euclidian Traveling Salesman
Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.
Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual.
Michael Bender - SUNY Stony Brook Dana Ron - Tel Aviv University Testing Acyclicity of Directed Graphs in Sublinear Time.
Testing Metric Properties Michal Parnas and Dana Ron.
Lower Bounds for Property Testing Luca Trevisan U C Berkeley.
Lecture 20: April 12 Introduction to Randomized Algorithms and the Probabilistic Method.
Introduction Outline The Problem Domain Network Design Spanning Trees Steiner Trees Triangulation Technique Spanners Spanners Application Simple Greedy.
Some 3CNF Properties are Hard to Test Eli Ben-Sasson Harvard & MIT Prahladh Harsha MIT Sofya Raskhodnikova MIT.
Sublinear time algorithms Ronitt Rubinfeld Computer Science and Artificial Intelligence Laboratory (CSAIL) Electrical Engineering and Computer Science.
Randomized Algorithms Morteza ZadiMoghaddam Amin Sayedi.
C&O 355 Mathematical Programming Fall 2010 Lecture 17 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.
Teaching Teaching Discrete Mathematics and Algorithms & Data Structures Online G.MirkowskaPJIIT.
Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
February 18, 2015CS21 Lecture 181 CS21 Decidability and Tractability Lecture 18 February 18, 2015.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Transitive-Closure Spanner of Directed Graphs Kyomin Jung KAIST 2009 Combinatorics Workshop Joint work with Arnab Bhattacharyya MIT Elena Grigorescu MIT.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Umans Complexity Theory Lectures Lecture 1a: Problems and Languages.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Seminar on Sub-linear algorithms Prof. Ronitt Rubinfeld.
Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.
11 Lecture 24: MapReduce Algorithms Wrap-up. Admin PS2-4 solutions Project presentations next week – 20min presentation/team – 10 teams => 3 days – 3.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
Property Testing (a.k.a. Sublinear Algorithms )
Information Complexity Lower Bounds
Introduction to Randomized Algorithms and the Probabilistic Method
Zhu Han University of Houston Thanks for Professor Dan Wang’s slides
Approximating the MST Weight in Sublinear Time
Lecture 22: Linearity Testing Sparse Fourier Transform
Lecture 18: Uniformity Testing Monotonicity Testing
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
Warren Center for Network and Data Sciences
Complexity of Expander-Based Reasoning and the Power of Monotone Proofs Sam Buss (UCSD), Valentine Kabanets (SFU), Antonina Kolokolova.
CS 154, Lecture 6: Communication Complexity
Last lecture Configuration Space Free-Space and C-Space Obstacles
Enumerating Distances Using Spanners of Bounded Degree
CIS 700: “algorithms for Big Data”
Richard Anderson Lecture 25 NP-Completeness
Introduction to PCP and Hardness of Approximation
Clustering.
Presentation transcript:

1 Sublinear Algorithms Lecture 1 Sofya Raskhodnikova Penn State University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A

Organizational Course webpage: Office hours: Tuesdays, Tuesdays, 2:30PM-3:30PM, IST 343FEvaluation class participation solutions to about 4 homework assignments taking lecture notes about 2-3 times per person the final project 2

Tentative Topics Introduction, examples and general techniques. Sublinear-time algorithms for graphs strings geometric properties of images basic properties of functions algebraic properties and codes metric spaces distributions Tools: probability, Fourier analysis, combinatorics, codes, … Sublinear-space algorithms: streaming 3

Tentative Plan Introduction, examples and general techniques. Lecture 1. Lecture 1. Background. Testing properties of images and lists. Lecture 2. (Next week) Lecture 2. (Next week) Properties of functions and graphs. Sublinear approximation. Lecture 3-5. Lecture 3-5. Background in probability. Techniques for proving hardness. Other models for sublinear computation. 4

Motivation for Sublinear-Time Algorithms Massive datasets world-wide web online social networks genome project sales logs census data high-resolution images scientific measurements Long access time communication bottleneck (slow connection) implicit data (an experiment per data point) 5

What Can We Hope For? What can an algorithm compute if it –reads only a sublinear portion of the data? –runs in sublinear time? Some problems have exact deterministic solutions For most interesting problems algorithms must be –approximate –randomized 6

A Sublinear-Time Algorithm 7 BLA-BLA-BLA-BLA-BLA-BLA-BLA-BLA approximate answer sublinear-time algorithm ? L ? B ? L ? A Quality of approximation Resources number of queries running time

Types of Approximation Classical approximation need to compute a value  output should be close to the desired value  example: average Property testing need to answer YES or NO  Intuition: only require correct answers on two sets of instances that are very different from each other 8

Classical Approximation A Simple Example

Approximate Diameter of a Point Set [Indyk]

Algorithm and Analysis A rare example of a deterministic sublinear-time algorithm

Property Testing

Property Testing: YES/NO Questions Does the input satisfy some property? (YES/NO) “in the ballpark” vs. “out of the ballpark” Does the input satisfy the property or is it far from satisfying it? sometimes it is the right question (probabilistically checkable proofs (PCPs)) as good when the data is constantly changing (WWW) fast sanity check to rule out inappropriate inputs (airport security questioning) 13

14 Property Tester Close to YES Far from YES Reject with probability 2/3 Don’t care Property Tester Definition Probabilistic Algorithm YES Reject with probability 2/3 NO far = differs in many places

Randomized Sublinear Algorithms Toy Examples

Property Testing: a Toy Example 16 Witness Lemma 0001…0100

Randomized Approximation: a Toy Example …0100 Hoeffding Bound

Property Testing Simple Examples

Testing Properties of Images 19

Pixel Model 20

Testing if an Image is a Half-plane [R03] 21

Half-plane Instances 22 A half-plane

Half-plane Instances 23 A half-plane

Half-plane Instances 24 A half-plane

Half-plane Instances 25 A half-plane

Half-plane Instances 26 A half-plane

Half-plane Instances 27 A half-plane

Half-plane Instances 28 A half-plane

Strategy “Testing by implicit learning” paradigm Learn the outline of the image by querying a few pixels. Test if the image conforms to the outline by random sampling, and reject if something is wrong. 29

Half-plane Test 30 Claim. Claim. The number of sides with different corners is 0, 2, or 4. Algorithm 1.Query the corners. ?? ??

Half-plane Test: 4 Bi-colored Sides 31 Claim. Claim. The number of sides with different corners is 0, 2, or 4. Analysis If it is 4, the image cannot be a half-plane. Algorithm 1.Query the corners. 2.If the number of sides with different corners is 4, reject.

Half-plane Test: 0 Bi-colored Sides 32 Claim. Claim. The number of sides with different corners is 0, 2, or 4. Analysis If all corners have the same color, the image is a half-plane if and only if it is unicolored. Algorithm ? ? ? ? ? ?

Half-plane Test: 2 Bi-colored Sides 33 Claim. Claim. The number of sides with different corners is 0, 2, or 4. Algorithm Analysis ? ? ? ?

Testing if an Image is a Half-plane [R03] 34

Other Results on Properties of Images 35

Testing if a List is Sorted Input: a list of n numbers x 1, x 2,..., x n Question: Is the list sorted? Requires reading entire list:  (n) time Approximate version: Is the list sorted or ² -far from sorted? (An ² fraction of x i ’s have to be changed to make it sorted.) [Ergün Kannan Kumar Rubinfeld Viswanathan 98, Fischer 01]: O((log n)/ ² ) time  (log n) queries Attempts: 1. Test: Pick a random i and reject if x i > x i+1. Fails on: Ã 1/2-far from sorted 2. Test: Pick random i x j. Fails on: Ã 1/2-far from sorted 36

Is a list sorted or ² -far from sorted? Idea: Associate positions in the list with vertices of the directed line. Construct a graph (2-spanner) by adding a few “shortcut” edges (i, j) for i < j where each pair of vertices is connected by a path of length at most 2 37 … … ≤ n log n edges … n-1 n

Is a list sorted or ² -far from sorted? Pick a random edge (x i,x j ) from the 2-spanner and reject if x i > x j Analysis: Call an edge (x i,x j ) violated if x i > x j, and good otherwise. If x i is an endpoint of a violated edge, call it bad. Otherwise, call it good. Proof: Consider any two good numbers, x i and x j. They are connected by a path of (at most) two good edges (x i,x k ), (x k,x j ). ) x i ≤ x k and x k ≤ x j ) x i ≤ x j x i x j xkxk Claim 1. Claim 1. All good numbers x i are sorted. Test Test [Dodis Goldreich Lehman Raskhodnikova Ron Samorodnitsky 99]

Is a list sorted or ² -far from sorted? Pick a random edge (x i,x j ) from the 2-spanner and reject if x i > x j Analysis: Call an edge (x i,x j ) violated if x i > x j, and good otherwise. If x i is an endpoint of a bad edge, call it bad. Otherwise, call it good. Proof: If a list is ² -far from sorted, it has ¸ ² n bad numbers. (Claim 1) Each violated edge contributes 2 bad numbers. 2-spanner has ¸ ² n/2 violated edges out of · n log n x i x j xkxk Claim 1. Claim 1. All good numbers x i are sorted. Claim 2. Claim 2. An ² -far list violates ¸ ² /(2 log n) fraction of edges in 2-spanner.

Is a list sorted or ² -far from sorted? Pick a random edge (x i,x j ) from the 2-spanner and reject if x i > x j Analysis: Call an edge (x i,x j ) violated if x i > x j, and good otherwise. By Witness Lemma, it suffices to sample (4 log n )/ ² edges from 2-spanner. Sample (4 log n)/ ² edges (x i,x j ) from the 2-spanner and reject if x i > x j. Guarantee: All sorted lists are accepted. All lists that are ² -far from sorted are rejected with probability ¸ 2/3. Time: O((log n)/ ² ) x i x j xkxk Test Test [Dodis Goldreich Lehman Raskhodnikova Ron Samorodnitsky 99] Algorithm Claim 2. Claim 2. An ² -far list violates ¸ ² /(2 log n) fraction of edges in 2-spanner.

Generalization 41 x y z x y

Lipschitz Continuous Functions A fundamental notion in  mathematical analysis  theory of differential equations Example uses of a Lipschitz constant c of a given function f  probability theory: in tail bounds via McDiarmid’s inequality  program analysis: as a measure of robustness to noise  data privacy: to scale noise added to preserve differential privacy A function f : D  R has Lipschitz constant c if for all x,y in D, distance R (f(x),f(y)) ≤ c ∙ distance D (x,y). 42

Computing a Lipschitz Constant? 43 Image sources:

Testing if a Function is Lipschitz [Jha R] 44 nodes = points in the domain; edges = points at distance 1 node labels = values of the function

Properties of a List of n Numbers 45

Basic Properties of Functions

47 f(000) f(111) f(011) f(100) f(101) f(110) f(010) f(001)

Monotonicity of Functions monotone

Monotonicity Test [GGLRS, DGLRRS] 49 Repair Lemma

Repair Lemma: Proof Idea 50 Proof idea: Transform f into a monotone function by repairing edges in one dimension at a time. Repair Lemma

51 Repairing Violated Edges in One Dimension Swapping horizontal dimension Swap violated edges 1  0 in one dimension to 0  1. Enough to prove the claim for squares i j

Proof of The Claim for Squares If no horizontal edges are violated, no action is taken. 52 Swapping horizontal dimension i j

Proof of The Claim for Squares If both horizontal edges are violated, both are swapped, so the number of vertical violated edges does not change. 53 Swapping horizontal dimension i j

Proof of The Claim for Squares Suppose one (say, top) horizontal edge is violated. If both bottom vertices have the same label, the vertical edges get swapped. 54 i j Swapping horizontal dimension 1001

Proof of The Claim for Squares Suppose one (say, top) horizontal edge is violated. If both bottom vertices have the same label, the vertical edges get swapped. Otherwise, the bottom vertices are labeled 0  1, and the vertical violation is repaired. 55 i j Swapping horizontal dimension

Proof of The Claim for Squares 56 Repair Lemma

monotone