Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sublinear Algorihms for Big Data

Similar presentations


Presentation on theme: "Sublinear Algorihms for Big Data"β€” Presentation transcript:

1 Sublinear Algorihms for Big Data
Exam Problems Grigory Yaroslavtsev

2 Problem 1: Modified Chernoff Bound
Problem: Derive Chernoff Bound #2 from Chernoff Bound #1. Explain your answer in detail. (Chernoff Bound #1) Let 𝑿 1 … 𝑿 𝒕 be independent and identically distributed r.vs with range [0, 1] and expectation πœ‡. Then if 𝑋= 1 𝒕 𝑖 𝑋 𝑖 and 1>𝛿>0, Pr 𝑋 βˆ’πœ‡ β‰₯π›Ώπœ‡ ≀2 exp βˆ’ π’•πœ‡ 𝛿 2 3 (Chernoff Bound #2) Let 𝑿 1 … 𝑿 𝒕 be independent and identically distributed r.vs with range [0, 𝒄] and expectation πœ‡. Then if 𝑋= 1 𝒕 𝑖 𝑋 𝑖 and 1>𝛿>0, Pr 𝑋 βˆ’πœ‡ β‰₯π›Ώπœ‡ ≀2 exp βˆ’ π’•πœ‡ 𝛿 2 3𝒄

3 Problem 2: Modified Chebyshev
Problem: Derive Chebyshev Inequality #2 from Chebyshev Inequality #1. Explain your answer in detail. (Chebyshev Inequality #1) For every 𝑐>0: Pr 𝑿 βˆ’π”Ό 𝑿 β‰₯𝑐 π‘‰π‘Žπ‘Ÿ 𝑿 ≀ 1 𝑐 2 (Chebyshev Inequality #2) For every 𝑐′>0: Pr 𝑿 βˆ’π”Ό 𝑿 β‰₯𝑐′ 𝔼 𝑿 ≀ π‘‰π‘Žπ‘Ÿ 𝑿 𝑐′ 𝔼 𝑿 2

4 Problem 3: Sparse Recovery Error
Definition (frequency vector): 𝑓 𝑖 = frequency of 𝑖 in the stream = # of occurrences of value 𝑖 𝑓=〈 𝑓 1 ,…, 𝑓 𝒏 βŒͺ Sparse Recovery: Find 𝑔 such that 𝑓 βˆ’π‘” 1 is minimized among 𝑔′s with at most π‘˜ non-zero entries. Definition: πΈπ‘Ÿ π‘Ÿ π‘˜ 𝑓 = min g: g 0 β‰€π‘˜ 𝑓 βˆ’π‘” 1 Problem: Show that πΈπ‘Ÿ π‘Ÿ π‘˜ 𝑓 = π‘–βˆ‰π‘† 𝑓 𝑖 where 𝑆 are indices of π‘˜ largest 𝑓 𝑖 . Explain your answer formally and in detail.

5 Problem 4: Approximate Median Value
Stream: π’Ž elements π‘₯ 1 ,… π‘₯ π’Ž from universe 𝒏 = 1, 2, …, 𝒏 . 𝑆={ π‘₯ 1 ,…, π‘₯ π‘š } (all distinct) and let π‘Ÿπ‘Žπ‘›π‘˜ 𝑦 =|π‘₯βˆˆπ‘† :π‘₯≀𝑦| Median 𝑴= π‘₯ 𝑖 , where π‘Ÿπ‘Žπ‘›π‘˜ π‘₯ 𝑖 = π’Ž (π’Ž odd). Algorithm: Return π’š= the median of a sample of size 𝒕 taken from 𝑆 (with replacement). Problem: Does this algorithm give a 10% approximate value of the median with probability β‰₯ 2 3 , i.e. 𝑦 such that π‘΄βˆ’ 𝒏 10 <π’š<𝑴+ 𝒏 10 if 𝒕=π‘œ(𝒏)? Explain your answer.

6 Problem 5: Lower bound on 𝐹 π‘˜
Definition (frequency vector): 𝑓 𝑖 = frequency of 𝑖 in the stream = # of occurrences of value 𝑖 𝑓=〈 𝑓 1 ,…, 𝑓 𝒏 βŒͺ Define (frequency moment): 𝐹 π‘˜ = 𝑖 𝑓 𝑖 π‘˜ Problem: Show that for all integer kβ‰₯1 it holds that: 𝐹 π‘˜ β‰₯𝑛 π‘š 𝑛 π‘˜ Hint: worst-case when 𝑓 1 =…= 𝑓 𝑛 = π‘š 𝑛 . Use convexity of 𝑔 π‘₯ = π‘₯ π‘˜

7 Problem 6: Approximate MST Weight
Let 𝐺 be a weighted graph with weights of all edges being integers between 1 and W. Definition: Let 𝑛 𝑖 be the # of connected components if we remove all edges with weight > 1+πœ– 𝑖 . Problem: For some constant π‘ͺ > 0 show the following bounds on the weight of the minimum spanning tree 𝑀 𝑀𝑆𝑇 : 𝑀 𝑀𝑆𝑇 ≀ 𝑖=0 ⌈log 1+πœ– π‘Š βŒ‰ πœ– 1+πœ– 𝑖 𝑛 𝑖 ≀ 1+π‘ͺ πœ– 𝑀(𝑀𝑆𝑇)

8 Evaluation You can work in groups (up to 3 people) Point system
Deadline: September 1st , 2014, 23:59 GMT Submissions in English: Submission Title (once): Exam + Space + β€œYour Name” Question Title: Question + Space + β€œYour Name” Submission format PDF from LaTeX (best) PDF You can work in groups (up to 3 people) Each group member lists all others on the submission Point system Course Credit = 2 points Problems 1-3 = 0.5 point each Problem 4 = 1 point Problems 5-6 = 2 point each


Download ppt "Sublinear Algorihms for Big Data"

Similar presentations


Ads by Google