Presentation is loading. Please wait.

Presentation is loading. Please wait.

Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University.

Similar presentations


Presentation on theme: "Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University."— Presentation transcript:

1 Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

2 Human Spatial Sensing The five senses: Hearing Taste Touch Smell Seeing f(t) f(x,y,,t)

3 Visual and Auditory Pathways

4 Two Problems in Spatial Sensing Stereo VisionAcoustic Localization

5 Clemson Vision Laboratory head tracking root detectionreconstruction highway monitoring motion segmentation

6 Clemson Vision Lab (cont.) microphone position calibration speaker localization

7 Stereo Vision INPUT OUTPUT LeftRight Disparity mapDepth discontinuities epipolar constraint

8 Epipolar Constraint Left cameraRight camera world point center of projection epipolar plane epipolar line

9 Energy Minimization Left Right intensity occluded pixels minimize: dissimilarity discontinuity penalty (underconstrained) constraint

10 History of Stereo Correspondence Birchfield & Tomasi 1998 Geiger et al. 1995 Intille &Bobick 1994 Belhumeur & Mumford 1992 Ohta & Kanade 1985 Baker & Binford 1981 MULTIWAY-CUT (2D) DYNAMIC PROGRAMMING (1D) Kolmogorov & Zabih 2001, 2002 Lin & Tomasi 2002 Birchfield & Tomasi 1999 Boykov, Veksler, and Zabih 1998 Roy & Cox 1998

11 Dynamic Programming: 1D Search D isparity map occlusion depth discontinuity RIGHT LEFT cart c a t 32111 21012 10123 01234 string editing: stereo matching: penalties: mismatch = 1 insertion = 1 deletion = 1 c a t c a r t

12 Multiway-Cut: 2D Search pixels labels pixels labels [Boykov, Veksler, Zabih 1998]

13 Multiway-Cut Algorithm minimum cut Minimizes source label sink label pixels (cost of label discontinuity) (cost of assigning label to pixel) pixels labels

14 Sampling-Insensitive Pixel Dissimilarity d(x L,x R ) xLxL xRxR d(x L,x R ) = min{d(x L,x R ),d(x R,x L )}Our dissimilarity measure: [Birchfield & Tomasi 1998] ILIL IRIR

15 Given: An interval A such that [x L – ½, x L + ½] _ A, and [x R – ½, x R + ½] _ A Dissimilarity Measure Theorems If | x L – x R | ≤ ½, then d(x L,x R ) = 0 | x L – x R | ≤ ½ iff d(x L,x R ) = 0 ∩ ∩ Theorem 1: Theorem 2: (when A is convex or concave) (when A is linear)

16 Correspondence as Segmentation Problem: disparities (fronto-parallel)O(  ) surfaces (slanted) O(   2 n) => computationally intractable! Solution: iteratively determine which labels to use label pixels find affine parameters of regions multiway-cut (Expectation) Newton-Raphson (Maximization)

17 Stereo Results (Dynamic Programming)

18 Stereo Results (Multiway-Cut)

19 Stereo Results on Middlebury Database image Birchfield Tomasi 1999 Hong- Chen 2004

20 Multiway-Cut Challenges Multiway-cutDynamic programming

21 Acoustic Localization Problem: Use microphone signals to determine sound source location Traditional solutions: 1.Delay-and-sum beamforming ! 2.Time-delay estimation (TDE) ! compact distributed Recent solutions: 3.Hemisphere sampling !! 4.Accumulated correlation !! 5.Bayesian ! 6.Zero-energy ! ! efficient ! accurate

22 Localization Geometry t 2 t 1 t - 2 t =  1 (one-half hyperboloid) microphones sound source time 

23 Principle of Least Commitment “Delay decisions as long as possible” Example: [Marr 1982 Russell & Norvig 1995]

24 Localization by Beamforming mic 1 signal delay mic 2 signal prefilter mic 3 signal find peak mic 4 signal prefilter sum  delay [Silverman &Kirtman 1992; Duraiswami et al. 2001; Ward & Williamson, 2002 ] energy ! accurate NOT efficient makes decision late in pipeline (“principle of least commitment”) delays (shifts) each signal for each candidate location

25 Localization by Time-Delay Estimation (TDE) mic 1 signal correlate find peak mic 2 signal prefilter mic 3 signal correlate find peak mic 4 signal prefilter intersect  (may be no intersection) [Brandstein et al. 1995; Brandstein & Silverman 1997; Wang & Chu 1997] ! efficient NOT accurate decision is made early cross-correlation computed once for each microphone pair

26 Localization by Hemisphere Sampling mic 1 signal correlate map to common coordinate system sampled locus sum temporal smoothing mic 2 signal prefilter mic 3 signal correlate map to common coordinate system mic 4 signal prefilter final sampled locus correlate … find peak  [Birchfield & Gillmor 2001] ! efficient ! accurate (but restricted to compact arrays)

27 Localization by Accumulated Correlation mic 1 signal correlate map to common coordinate system sampled locus sum temporal smoothing mic 2 signal prefilter mic 3 signal correlate map to common coordinate system mic 4 signal prefilter final sampled locus correlate … find peak  [Birchfield & Gillmor 2002] ! efficient ! accurate

28 Accumulated Correlation Algorithm microphone candidate location = likelihood +... pair 1: pair 2: +

29 Comparison Bayesian: Zero energy: Acc corr: Hem samp: TDE: similarity energy efficient accurate Beamforming:

30 Unifying framework efficient accurate

31 Integration limits Beamforming Bayesian Zero energy Accumulated correlation Hemisphere sampling Time-delay estimation

32 Compact Microphone Array microphone d=15cm sampled hemisphere

33 Results on compact array pan tilt without PHAT prefilterwith PHAT prefilter

34 More Comparison Hemisphere Sampling [Birchfield & Gillmor 2001] Beamforming Accumulated Correlation [Birchfield & Gillmor 2002]

35 Results on distributed array

36 Computational efficiency Computing time per window (ms) (600x faster)(50x faster)

37 Simultaneous Speakers +=

38 Detecting Noise Sources background noise source

39 Connection with Stereo [Okutomi & Kanade 1993] “Multi-baseline stereo”

40 Conclusion Spatial sensing achieved by arrays of visual and auditory sensors Stereo vision –match visual signals from multiple cameras –recent breakthrough: multiway-cut –limitations of multiway-cut Acoustic localization –match acoustic signals from multiple microphones –recent breakthrough: accumulated correlation –connection with multi-baseline stereo


Download ppt "Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University."

Similar presentations


Ads by Google