Download presentation

Presentation is loading. Please wait.

Published byErick Nadler Modified about 1 year ago

1
So Much Data Bernard Chazelle Princeton University Princeton University Bernard Chazelle Princeton University Princeton University So Little Time

2
So Many Slides Bernard Chazelle Princeton University Princeton University Bernard Chazelle Princeton University Princeton University So Little Time So Little Time (before lunch) (before lunch)

3
computation math experimentationalgorithms

4
Computers have two problems

5
1. They don’t have steering wheels

6

7
2. End of Moore’s Law party’s over !

8
computation algorithms experimentation

9
32 x = 544 This is not me

10
FFT RSA

11

12

13
noisy low entropy uncertain unevenly priced big

14
noisy low entropy uncertain unevenly priced big

15
Biomedical imaging Sloan Digital Sky Survey 4 petabytes (~1MG) (~1MG) 10 petabytes/yr 150 petabytes/yr

16
Collected works of Micha Sharir My A(9,9)-th paper

17
massive input massive input output Sublinear Algorithms Sample tiny fraction

18
Shortest Paths [C-Liu-Magen ’03] New York DelphiDelphi

19
Ray Shooting Volume Intersection Point location

20
Approximate MST [C-Rubinfeld- Trevisan ’01]

21
Reduces to counting connected components

22
EE = no. connected components varvar << (no. connected components) 22 whp, is a good estimator of # connected components

23
worst case input space average case (uniform)

24
worst case

25
average case = actuarial view

26
“ OK, if you elect NOT to have the surgery, the insurance company offers 6 days and 7 nights in Barbados. “

27
arbitrary, unknown random source Self-Improving Algorithms

28
Yes ! This could be YOU, too !

29
E Tk Optimal expected time for random source time T1 time T2 time T3 time T4

30
Clustering [ Ailon-C-Liu-Comandur ’05 ] K-median over Hamming cube

31
minimize sum of distances

32

33
[ Kumar-Sabharwal-Sen ’04 ] COST OPT ( 1 + )

34
How to achieve linear limiting time? Input space {0,1} dndn prob < O(dn)/KSS Identify core Tail:Tail: Use KSS

35
Store sample of precomputed KSS Nearest neighbor Incremental algorithm

36
Main difficulty: How to spot the tail?

37

38
encode

39
decode

40

41
Data inaccessible before noise What makes you think it’s wrong?

42
Data inaccessible before noise must satisfy some property (eg, convex, bipartite) but does not quite

43
f(x) = ? x f(x) data f = access function

44
f(x) = ? x f(x) f = access function

45
f(x) = ? x f(x) But life being what it is…

46
f(x) = ? x f(x)

47
Humans Define distance from any object to data class

48
f(x) = ? x g(x) x 1, x 2,… f ( x 1), f ( x 2),… filter g is access function for:

49
Online Data Reconstructio n Online Data Reconstructio n

50
Monotone function: [n] R d Filter requires polylog (n) lookups [ Ailon-C-Liu-Comandur ’04 ] [ Ailon-C-Liu-Comandur ’04 ]

51
Convex polygon Filter requires : lookups [C-Comandur ’06 ]

52
Convex terrain lookups Filter requires :

53
Iterated planar separator theorem

54

55
Iterated (weak) planar separator theorem Iterated (weak) planar separator theorem in sublinear time!

56
Using epsilon-nets in spaces of unbounded VC dimension reconstruct

57
bipartite graph k-connectivity expander

58
denoising low-dim attractor sets

59
Priced computation & accuracy Priced computation & accuracy spectrometry/cloning/gene chip spectrometry/cloning/gene chip PCR/hybridization/chromatography PCR/hybridization/chromatography gel electrophoresis/blotting gel electrophoresis/blotting spectrometry/cloning/gene chip spectrometry/cloning/gene chip PCR/hybridization/chromatography PCR/hybridization/chromatography gel electrophoresis/blotting gel electrophoresis/blotting o Linear programming Linear programming

60
Pricing data Pricing data Factoring is easy. Here’s why… Gaussian mixture sample: ….

61
Collaborators: Collaborators: Nir Ailon, Seshadri Comandur, Ding Liu Avner Magen, Ronitt Rubinfeld, Luca Trevisan Collaborators: Collaborators: Nir Ailon, Seshadri Comandur, Ding Liu Avner Magen, Ronitt Rubinfeld, Luca Trevisan

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google