Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab.

Similar presentations


Presentation on theme: "Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab."— Presentation transcript:

1

2 Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab

3 MIT Project Oxygen A multi-laboratory effort at MIT to develop pervasive, human-centric computing Enabling people “to do more by doing less,” that is, to accomplish more with less work Bringing abundant computation and communication as pervasive as free air, naturally into people’s lives

4 Human-centered Interfaces Free users from desktop and wired interfaces Allow natural gesture and speech commands Give computers awareness of users Work in open and noisy environments -Outdoors -- PDA next to construction site! -Indoors -- crowded meeting room Vision’s role: provide perceptive context

5 Perceptive Context Who is there? (presence, identity) What is going on? (activity) Where are they? (individual location) Which person said that? (audiovisual grouping) What are they looking / pointing at? (pose, gaze)

6 Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction

7 Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction

8 Person Identification at a distance Multiple cameras Face and gait cues Approach: canonical frame for each modality by placing the virtual camera at a desired viewpoint Face: frontal view, fixed scale Gait: profile silhouette Need to place virtual camera -explicit model estimation -search -motion-based heuristic  trajectory We combine trajectory estimate and limited search

9 Virtual views Frontal Profile silhouette: Face: Input

10 Examples: VH-generated views Faces: Gait:

11 Effects of view-normalization

12 Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction

13 Range-based stereo person tracking Range can be insensitive to fast illumination change Compare range values to known background Project into 2D overhead view IntensityRangeForeground Plan view Merge data from multiple stereo cameras.. Group into trajectories… Examine height for sitting/standing…

14 Visibility Constraints for Virtual Backgrounds virtual background for C 1

15 Virtual Background Segmentation Sparse Background New Image Detected Foreground! Second View Virtual Background for first view Detected Foreground!

16 Points -> trajectories -> active sensing Active Camera motion Microphone array Activity classification trajectories Spatio- temporal points

17 Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction

18 Audio input in noisy environments Acquire high-quality audio from untethered, moving speakers “Virtual” headset microphones for all users

19 Vision guided microphone array Cameras Microphones

20 System flow (single target) Vision-based tracker Gradient ascent search in array output power Delay-and-sum beamformer Video Streams Audio Streams

21 Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction

22 Audio-visual Analysis Multi-modal approach to source separation Exploit joint statistics of image and audio signal Use non-parametric density estimation Audio-based image localization Image-based audio localization A/V Verification: is this audio and video from the same person?

23 Audio-visual synchrony detection

24 Audio weighting from video (detected face) + AVMI Applications Image localization from audio Audio associated with left face Audio associated with right face New: Synchronization Detection! image variance AVMI

25 Audio-visual synchrony detection MI: 0.68 0.61 0.19 0.20 Compute confusion matrix for 8 subjects: No errors! No training! Also can use for audio/visual temporal alignment….

26 Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction

27 Face pose estimation rigid motion estimation with long-term drift reduction

28 Brightness and depth motion constraints I t I t + 1 II ZZ Z t Z t + 1 y t =  y t-1 Parameter space

29 New bounded error tracking algorithm Influence region open loop 2D tracker closed loop 2D tracker Track relative to all previous frames which are close in pose space

30 Closed-loop 3D tracker Track users head gaze for hands-free pointing…

31 Head-driven cursor Related Projects: Schiele Kjeldsen Toyama Current application for second pointer or scrolling / focus of attention…

32 Head-driven cursor MethodAvg. error. (pixels) Cylindrical head tracker 25 2D Optical Flow head tracker22.9 Hybrid 30 3D head tracker (ours)7.5 Eye gaze 27 Trackball3.7 Mouse1.9

33 Gaze aware interface Drowsy driver detection: head nod and eye-blink… Interface Agent responds to gaze of user -agent should know when it’s being attended to -turn-taking pragmatics -anaphora / object reference First prototype -E21 interface “sam” -current experiments with face tracker on meeting room table Integrating with wall cameras and hand gesture interfaces…

34 “Look-to-talk” Subject not looking at SAM ASR turned off Subject looking at SAM ASR turned on

35 Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction Conclusion and contact info.

36 Conclusion: Perceptual Context Take-home message: vision provides Perceptual Context to make applications aware of users.. activity -- adapting outdoor activity classification [ Grimson and Stauffer ] to indoor domain… So far: detection, ID, head pose, audio enhancement and synchrony verification… Soon: gaze -- add eye tracking on pose stabilized face pointing -- arm gestures for selection and navigation.

37 Contact Prof. Trevor Darrell www.ai.mit.edu/projects/vip Person Identification at a distance from multiple cameras and multiple cues (face, gait) -Greg Shakhnarovich Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues -Neal Checka, Leonid Taycher, David Demirdjian Vision guided microphone array -Kevin Wilson Joint statistical models for audiovisual fusion -John Fisher Face pose estimation: rigid motion estimation with long-term drift reduction -Louis Morency, Alice Oh, Kristen Grauman


Download ppt "Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab."

Similar presentations


Ads by Google