Download presentation
Presentation is loading. Please wait.
1
Satyaki Mahalanabis Daniel Štefankovi č University of Rochester Density estimation in linear time (+approximating L 1 -distances)
2
Density estimation DATA+ f1f1 f2f2 f3f3 f4f4 f5f5 f6f6 density F = a family of densities
3
Density estimation - example + N( , ) 0.418974, 0.848565, 1.73705, 1.59579, -1.18767, -1.05573, -1.36625 F = a family of normal densities with =1
4
Measure of quality: L 1 – distance from the truth Why L 1 ? |f-g| 1 = |f(x)-g(x)| dx 1) small L 1 all events estimated with small additive error 2) scale invariant g=TRUTH f=OUTPUT
5
Obstacles to “quality”: DATA+ weak class of densities bad data F dist 1 (g,F) ?
6
What is bad data ? g = TRUTH h = DATA (empirical density) | h-g | 1 = 2max |h(A)-g(A)| A Y(F) Y(F) = Yatracos class of F A ij ={ x | f i (x)>f j (x) } f1f1 f2f2 f3f3 A 12 A 13 A 23
7
= 2max |h(A)-g(A)| A Y(F) Density estimation DATA (h) + F with small |g-f| 1 assuming these are small: dist 1 (g,F) f
8
= 2max |h(A)-g(A)| A Y(F) Why would these be small ??? dist 1 (h,F) 1) pick a large enough F 2) pick a small enough F so that VC-dimension of Y(F) is small 3) data are iid from h They will be if: E[max|h(A)-g(A)|] Theorem ( Haussler,Dudley, Vapnik, Chervonenkis ): VC(Y) samples AYAY
9
How to choose from 2 densities? f1f1 f2f2
10
f1f1 f2f2 +1
11
How to choose from 2 densities? f1f1 f2f2 +1 T T f 1 T f 2 ThTh
12
How to choose from 2 densities? f1f1 f2f2 +1 T T f 1 T f 2 ThTh Scheffé: if T h > T (f 1 +f 2 )/2 f 1 else f 2 Theorem (see DL’01): |f-g| 1 3dist 1 (g,F) + 2
13
= 2max |h(A)-g(A)| A Y(F) Density estimation DATA (h) + F with small |g-f| 1 assuming these are small: dist 1 (g,F) f
14
Test functions T ij (x) = sgn(f i (x) – f j (x)) T ij (f i – f j ) = (f i -f j )sgn(f i -f j ) = |f i – f j | 1 F={f 1,f 2,...,f N } T ij f i T ij f j f i winsf j wins T ij h
15
Density estimation algorithms Scheffé tournament: Pick the density with the most wins. Theorem (DL’01): |f-g| 1 9dist 1 (g,F)+8 Minimum distance estimate (Y’85): Output f k F that minimizes max |(f k -h) T ij | Theorem (DL’01): |f-g| 1 3dist 1 (g,F)+2 ij n2n2 n3n3
16
Density estimation algorithms Scheffé tournament: Pick the density with the most wins. Theorem (DL’01): |f-g| 1 9dist 1 (g,F)+8 Minimum distance estimate (Y’85): Output f k F that minimizes max |(f k -h) T ij | Theorem (DL’01): |f-g| 1 3dist 1 (g,F)+2 ij n2n2 n3n3 Can we do better?
17
Our algorithm: Efficient minimum loss-weight repeat until one distribution left 1) pick the pair of distributions in F that are furthest apart (in L 1 ) 2) eliminate the loser Theorem [MS’08]: |f-g| 1 3dist 1 (g,F)+2 n Take the most “discriminative” action. * * after preprocessing F
18
Tournament revelation problem INPUT: a weighed undirected graph G (wlog all edge-weights distinct) OUTPUT: REPORT: heaviest edge {u 1,v 1 } in G ADVERSARY eliminates u 1 or v 1 G 1 REPORT: heaviest edge {u 2,v 2 } in G 1 ADVERSARY eliminates u 2 or v 2 G 2..... OBJECTIVE: minimize total time spent generating reports
19
Tournament revelation problem 1 23 4 56 A B C D report the heaviest edge
20
Tournament revelation problem 1 23 4 56 A B C D report the heaviest edge BC
21
Tournament revelation problem 1 23 A C D report the heaviest edge BC eliminate B report the heaviest edge
22
Tournament revelation problem 1 23 A C D report the heaviest edge BC eliminate B report the heaviest edge AD
23
Tournament revelation problem 1 C D report the heaviest edge BC eliminate B report the heaviest edge AD eliminate A report the heaviest edge CD
24
Tournament revelation problem 1 23 4 56 A B C D BC B C AD BD AD D B DCAC AD AB 2 O(F) preprocessing O(F) run-time O(F 2 log F) preprocessing O(F 2 ) run-time WE DO NOT KNOW: Can get O(F) run-time with polynomial preprocessing ???
25
Efficient minimum loss-weight repeat until one distribution left 1) pick the pair of distributions that are furthest apart (in L 1 ) 2) eliminate the loser 2 O(F) preprocessing O(F) run-time O(F 2 log F) preprocessing O(F 2 ) run-time WE DO NOT KNOW: Can get O(F) run-time with polynomial preprocessing ??? (in practice 2) is more costly)
26
Efficient minimum loss-weight repeat until one distribution left 1) pick the pair of distributions that are furthest apart (in L 1 ) 2) eliminate the loser Theorem: |f-g| 1 3dist 1 (g,F)+2 n Proof: For every f’ to which f loses |f-f’| 1 max |f’-f’’| 1 f’ loses to f’’ “that guy lost even more badly!”
27
Proof: For every f’ to which f loses |f-f’| 1 max |f’-f’’| 1 f’ loses to f’’ “that guy lost even more badly!” f1f1 BEST=f 2 f3f3 bad loss 2h T 23 f 2 T 23 + f 3 T 23 (f 1 -f 2 ) T 12 (f 2 -f 3 ) T 23 (f 4 -h) T 23 (f i -f j ) (T ij -T kl ) 0 |f 1 -g| 1 3|f 2 -g| 1 +2
28
Application: kernel density estimates (Akaike’54,Parzen’62,Rosenblatt’56) K = kernel h = density kernel used to smooth empirical g (x 1,x 2,...,x n i.i.d. samples from h) K(y-x i ) 1 n i=1 n g * K h * K as n =
29
K(y-x i ) 1 n i=1 n h * K as n What K should we choose? Dirac would be goodDirac is not good Something in-between: bandwidth selection for kernel density estimates K s (x)= K(x/s) s as s 0 K s (x) Dirac Theorem (see DL’01): as s 0 with sn |g*K – h| 1 0 g * K =
30
Data splitting methods for kernel density estimates K 1 nsns y-x i ( ) s How to pick the smoothing factor ? i=1 n x 1,x 2,...,x n x 1,...,x n-m x n-m+1,...,x n f s = K 1 (n-m)s y-x i ( ) s i=1 n-m choose s using density estimation
31
Kernels we will use: K 1 nsns y-x i ( ) s piecewise uniform piecewise linear
32
Bandwidth selection for uniform kernels N distributions each is piecewise uniform with n pieces m datapoints E.g. N n 1/2 m n 5/4 Goal: run the density estimation algorithm efficiently g T ij (f i +f j ) T ij 2 |f i -f j | 1 (f k -h) T kj EMLWMD N2N2 N2N2 N n TIME n+m log n
33
Bandwidth selection for uniform kernels N distributions each is piecewise uniform with n pieces m datapoints E.g. N n 1/2 m n 5/4 Goal: run the density estimation algorithm efficiently g T ij (f i +f j ) T ij 2 |f i -f j | 1 (f k -h) T kj EMLWMD N2N2 N2N2 N n TIME n+m log n Can speed this up?
34
Bandwidth selection for uniform kernels N distributions each is piecewise uniform with n pieces m datapoints E.g. N n 1/2 m n 5/4 Goal: run the density estimation algorithm efficiently g T ij (f i +f j ) T ij 2 |f i -f j | 1 (f k -h) T kj EMLWMD N2N2 N2N2 N n TIME n+m log n Can speed this up? absolute error bad relative error good
35
Approximating L 1 -distances between distributions WE WILL DO: (N 2 +Nn) (log N) 22 TRIVIAL (exact): N 2 n N piecewise uniform densities (each n pieces)
36
Dimension reduction for L 2 Johnson-Lindenstrauss Lemma (’82) : L 2 L t 2 t = O( -2 ln n) ( x,y S) d(x,y) d( (x), (y)) (1+ )d(x,y) |S|=n N(0,t -1/2 )
37
Dimension reduction for L 1 Cauchy Random Projection (Indyk’00) : L 1 L t 1 t = O( -2 ln n) ( x,y S) d(x,y) est( (x), (y)) (1+ )d(x,y) |S|=n N(0,t -1/2 )C(0,1/t) (Charikar, Brinkman’03 : cannot replace est by d)
38
Cauchy distribution C(0,1) density function:1 (1+x 2 ) X C(0,1) aX C(0,|a|) X C(0,a), Y C(0,b) X+Y C(0,a+b) FACTS:
39
X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 A B z X 1 C(0,z) A(X 2 +X 3 ) + B(X 5 +X 6 +X 7 +X 8 ) Cauchy random projection for L 1 D (Indyk’00)
40
X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 A B z Cauchy random projection for L 1 D D(X 1 +X 2 +...+X 8 +X 9 ) (Indyk’00) X 1 C(0,z) A(X 2 +X 3 ) + B(X 5 +X 6 +X 7 +X 8 ) Cauchy(0,| - | 1 )
41
All pairs L 1 -distances piece-wise linear densities
42
All pairs L 1 -distances piece-wise linear densities X 1 X 2 C(0,1/2) R=(3/4)X 1 + (1/4)X 2 B=(3/4)X 2 + (1/4)X 1 R-B C(0,1/2)
43
All pairs L 1 -distances piece-wise linear densities Problem: too many intersections! Solution: cut into even smaller pieces! Stochastic measures are useful.
44
Brownian motion exp(-x^2/2) 1 (2 1/2 Cauchy motion 1 (1+x) 2
45
Brownian motion exp(-x^2/2) 1 (2 1/2 f dL = Y N(0,S) computing integrals is easy f:R R d
46
f dL = Y C(0,s) for d=1 computing integrals is easy f:R R d Cauchy motion 1 (1+x) 2 computing integrals is hard d>1 * obtaining explicit expression for the density *
47
X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 What were we doing? (f 1,f 2,f 3 ) dL = (w 1 ) 1,(w 2 ) 1,(w 3 ) 1
48
X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 What were we doing? (f 1,f 2,f 3 ) dL = (w 1 ) 1,(w 2 ) 1,(w 3 ) 1 Can we efficiently compute integrals dL for piecewise linear?
49
Can we efficiently compute integrals dL for piecewise linear? R R 2 z)=(1,z) (X,Y)= dL
50
R R 2 z)=(1,z) (X,Y)= dL (2(X-Y),2Y) has density at u+v,u-v 2
51
All pairs L 1 -distances for mixtures of uniform densities in time O( (N^2+Nn) (log N) 22 ) All pairs L 1 -distances for piecewise linear densities in time O( (N^2+Nn) (log N) 22 )
52
R R 3 z)=(1,z,z 2 ) (X,Y,Z)= dL ? 1) QUESTIONS 2) higher dimensions ?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.