Presentation is loading. Please wait.

Presentation is loading. Please wait.

“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound Kidron, Schechner, Elad, CVPR 2005 34.

Similar presentations


Presentation on theme: "“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound Kidron, Schechner, Elad, CVPR 2005 34."— Presentation transcript:

1 “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound Kidron, Schechner, Elad, CVPR 2005 34

2 Audio-Visual Analysis: Applications Lip reading – detection of lips (or person) Slaney, Covell (2000) Bregler, Konig (1994) Analysis and synthesis of music from motion Murphy, Andersen, Jensen (2003) Source separation based on vision Li, Dimitrova, Li, Sethi (2003) Smaragdis, Casey (2003) Nock, Iyengar, Neti (2002) Fisher, Darrell, Freeman, Viola (2001) Hershey, Movellan (1999) Tracking Vermaak, Gangnet, Blake, Pérez (2001) Biological systems Gutfreund, Zheng, Knudsen (2002) 47

3 Problem: Different Modalities camera microphone audio-visual analysis Visual data 25 frames/sec Each frame: 576 x 720 pixels Audio data 44.1 KHz, few bands Not stereophonic Kidron, Schechner, Elad, Pixels that Sound 47

4 Previous Work  Pointwise correlation Nock, Iyengar, Neti (2002) Hershey, Movellan (1999) Ill-posed (lack of data) Canonical Correlation Analysis (CCA) Smaragdis, Casey (2003) Li, Dimitrova, Li, Sethi (2003) Slaney, Covell (2000)  Cluster of pixels - linear superposition Mutual Information (MI) Fisher et. al. (2001) Cutler, Davis (2000) Bregler,Konig (1994) Not Typical highly complex 54

5 Kidron, Schechner, Elad, Pixels that Sound 49 Projection VideoAudio Pixel #1 Pixel #2 Pixel #3 Band #1 Band #2 Optimal Optimal visual components CCA

6 Visual Projection 1D variable Projection 34012052687436859 Video features Pixels intensity Transform coeff (wavelet) Image differences v 40

7 Audio Projection 1D variable Projection Audio features Average energy per frame Transform coeffs per frame a 41

8 Canonical Correlation Video Audio Representation Projections (per time window) Random variables (time dependent) Correlation coefficient 42

9 CCA Formulation yield an eigenvalue problem: Knutsson, Borga, Landelius (1995) Canonical Correlation Projections Largest Eigenvalue equivalent to Corresponding Eigenvectors 43

10 Visual Data t (frames) Spatial Location (pixels intensities) Kidron, Schechner, Elad, Pixels that Sound 51

11 Rank Deficiency t (frames) Spatial Location (pixels intensities) = Kidron, Schechner, Elad, Pixels that Sound 44

12 Estimation of Covariance Rank deficient 45

13 Ill-Posedness Prior solutions: Use many more frames  poor temporal resolution. Aggressive spatial pruning  poor spatial resolution. Trivial regularization Impossible to invert !!! 46

14 A General Problem Small amount of data The problem is ILL-POSED Over fitting is likely Large number of weights 47

15 An Equivalent Problem Minimizing Maximizing 48

16 Single Audio Band (The denominator is non-zero) Minimizing Known data A has a single column, and 49

17 = Time a(t i ) a (1) a (30) a (2) V a Full correlation if Underdetermined system ! Kidron, Schechner, Elad, Pixels that Sound 52 end

18 Detected correlated pixels “Out of clutter, find simplicity. From discord, find harmony.” Albert Einstein 52 end

19 Sparse Solution Non-convex Exponential complexity -norm minimum 53

20 The -norm criterion Sparse Convex Polynomial complexity in common situations -norm minimum Donoho, Elad (2005) 54

21 The Minimum Norm Solution Energy spread -norm minimum Solving using -norm (pseudo-inverse, SVD, QR) 55

22 Linear programming Fully correlated Sparse No parameters to tweak Polynomial Audio-visual events Maximum correlation: Eigenproblem Minimum objective function G 56

23 Multiple Audio Bands - Solution -ball Non-convex constraint Convex Linear The optimization problem: 57

24 Multiple Audio Bands Optimization over each face is: S1S1 S2S2 S3S3 S4S4 No parameters to tweak Each face: linear programming 58

25 Sharp & Dynamic, Despite Distraction Frame 9Frame 42Frame 68 Frame 115Frame 146Frame 169

26 Frame 51 Frame 106 Frame 83 Frame 177 Sparse Localization on the proper elements False alarm – temporally inconsistent Handling dynamics Performing in Audio Noise

27 –norm: Energy Spread Movie #1Movie #2 Frame 83Frame 146 56

28 –norm: Localization Movie #1Movie #2 Frame 83Frame 146 57

29 The “Chorus Ambiguity” Who’s talking? Synchronized talk Not unique (ambiguous) Possible solutions: Left Right Both

30 The “Chorus Ambiguity” -norm feature 1 feature 2 feature 1 feature 2 Both


Download ppt "“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound Kidron, Schechner, Elad, CVPR 2005 34."

Similar presentations


Ads by Google