Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab.

Slides:



Advertisements
Similar presentations
ARTIFICIAL PASSENGER.
Advertisements

Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab.
1 Video Processing Lecture on the image part (8+9) Automatic Perception Volker Krüger Aalborg Media Lab Aalborg University Copenhagen
Reducing Drift in Parametric Motion Tracking
Vision Based Control Motion Matt Baker Kevin VanDyke.
Learning and Vision for Multimodal Conversational Interfaces Trevor Darrell Vision Interface Group MIT CSAIL Lab.
Real-time Tracking of Multiple People Using Stereo David BeymerBob Bolles Kurt Konolige Chris Eveland Artificial Intelligence Center SRI International.
Adviser : Ming-Yuan Shieh Student ID : M Student : Chung-Chieh Lien VIDEO OBJECT SEGMENTATION AND ITS SALIENT MOTION DETECTION USING ADAPTIVE BACKGROUND.
 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.
Paper by Shi, Qin, Suo, & Xiao Presented by Alan Kelly November 16, 2011.
Computer Vision REU Week 2 Adam Kavanaugh. Video Canny Put canny into a loop in order to process multiple frames of a video sequence Put canny into a.
Exchanging Faces in Images SIGGRAPH ’04 Blanz V., Scherbaum K., Vetter T., Seidel HP. Speaker: Alvin Date: 21 July 2004.
Recent Developments in Human Motion Analysis
MUltimo3-D: a Testbed for Multimodel 3-D PC Presenter: Yi Shi & Saul Rodriguez March 14, 2008.
Background Estimation with Gaussian Distribution for Image Segmentation, a fast approach Gianluca Bailo, Massimo Bariani, Paivi Ijas, Marco Raggio IEEE.
Region-Level Motion- Based Background Modeling and Subtraction Using MRFs Shih-Shinh Huang Li-Chen Fu Pei-Yung Hsiao 2007 IEEE.
Gaze Awareness for Videoconferencing: A Software Approach Nicolas Werro.
CS335 Principles of Multimedia Systems Multimedia and Human Computer Interfaces Hao Jiang Computer Science Department Boston College Nov. 20, 2007.
Stockman MSU Fall Computing Motion from Images Chapter 9 of S&S plus otherwork.
Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.
Segmentation and tracking of the upper body model from range data with applications in hand gesture recognition Navin Goel Intel Corporation Department.
Face Recognition and Retrieval in Video Basic concept of Face Recog. & retrieval And their basic methods. C.S.E. Kwon Min Hyuk.
FACE DETECTION AND RECOGNITION By: Paranjith Singh Lohiya Ravi Babu Lavu.
Building the Design Studio of the Future Aaron Adler Jacob Eisenstein Michael Oltmans Lisa Guttentag Randall Davis October 23, 2004.
Computer Vision Systems for the Blind and Visually Disabled. STATS 19 SEM Talk 3. Alan Yuille. UCLA. Dept. Statistics and Psychology.
Where, Who and Intelligent Affective Interaction ICANN, Sept. 14, Athens, Greece Aristodemos Pnevmatikakis, John Soldatos and Fotios Talantzis.
Multimedia Specification Design and Production 2013 / Semester 2 / week 8 Lecturer: Dr. Nikos Gazepidis
Multimodal Interaction Dr. Mike Spann
TP15 - Tracking Computer Vision, FCUP, 2013 Miguel Coimbra Slides by Prof. Kristen Grauman.
Exploiting video information for Meeting Structuring ….
1 Mean shift and feature selection ECE 738 course project Zhaozheng Yin Spring 2005 Note: Figures and ideas are copyrighted by original authors.
REU Project RGBD gesture recognition with the Microsoft Kinect Steven Hickson.
EE 492 ENGINEERING PROJECT LIP TRACKING Yusuf Ziya Işık & Ashat Turlibayev Yusuf Ziya Işık & Ashat Turlibayev Advisor: Prof. Dr. Bülent Sankur Advisor:
NM – LREC 2008 /1 N. Moreau 1, D. Mostefa 1, R. Stiefelhagen 2, S. Burger 3, K. Choukri 1 1 ELDA, 2 UKA-ISL, 3 CMU s:
Dynamic 3D Scene Analysis from a Moving Vehicle Young Ki Baik (CV Lab.) (Wed)
Full-body motion analysis for animating expressive, socially-attuned agents Elisabetta Bevacqua Paris8 Ginevra Castellano DIST Maurizio Mancini Paris8.
An Information Fusion Approach for Multiview Feature Tracking Esra Ataer-Cansizoglu and Margrit Betke ) Image and.
Multimodal Information Analysis for Emotion Recognition
Recognition, Analysis and Synthesis of Gesture Expressivity George Caridakis IVML-ICCS.
GENESIS OF VIRTUAL REALITY  The term ‘Virtual reality’ (VR) was initially coined by Jaron Lanier, founder of VPL Research (1989)..
Tracking with CACTuS on Jetson Running a Bayesian multi object tracker on a low power, embedded system School of Information Technology & Mathematical.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Kevin Cherry Robert Firth Manohar Karki. Accurate detection of moving objects within scenes with dynamic background, in scenarios where the camera is.
卓越發展延續計畫分項三 User-Centric Interactive Media ~ 主 持 人 : 傅立成 共同主持人 : 李琳山,歐陽明,洪一平, 陳祝嵩 水美溫泉會館研討會
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
Prototype 3: MI prototype for video surveillance and biometry CVDSP-UJI Computer Vision Group – UJI Digital Signal Processing Group – UV November 2010.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #16.
Model of the Human  Name Stan  Emotion Happy  Command Watch me  Face Location (x,y,z) = (122, 34, 205)  Hand Locations (x,y,z) = (85, -10, 175) (x,y,z)
 Motivated by desire for natural human-robot interaction  Encapsulates what the robot knows about the human  Identity  Location  Intentions Human.
Natural Tasking of Robots Based on Human Interaction Cues Brian Scassellati, Bryan Adams, Aaron Edsinger, Matthew Marjanovic MIT Artificial Intelligence.
KAMI KITT ASSISTIVE TECHNOLOGY Chapter 7 Human/ Assistive Technology Interface.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Chapter 5 Multi-Cue 3D Model- Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical.
Rick Parent - CIS681 Motion Analysis – Human Figure Processing video to extract information of objects Motion tracking Pose reconstruction Motion and subject.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Using Adaptive Tracking To Classify And Monitor Activities In A Site W.E.L. Grimson, C. Stauffer, R. Romano, L. Lee.
IEEE International Conference on Multimedia and Expo.
Person Following with a Mobile Robot Using Binocular Feature-Based Tracking Zhichao Chen and Stanley T. Birchfield Dept. of Electrical and Computer Engineering.
Feel the beat: using cross-modal rhythm to integrate perception of objects, others, and self Paul Fitzpatrick and Artur M. Arsenio CSAIL, MIT.
Computer Vision: 3D Shape Reconstruction Use images to build 3D model of object or site 3D site model built from laser range scans collected by CMU autonomous.
Digital Video Representation Subject : Audio And Video Systems Name : Makwana Gaurav Er no.: : Class : Electronics & Communication.
MIT Artificial Intelligence Laboratory — Research Directions Intelligent Perceptual Interfaces Trevor Darrell Eric Grimson.
Hand Gestures Based Applications
CALO VISUAL INTERFACE RESEARCH PROGRESS
Paper – Stephen Se, David Lowe, Jim Little
CAPTURING OF MOVEMENT DURING MUSIC PERFORMANCE
Combining Geometric- and View-Based Approaches for Articulated Pose Estimation David Demirdjian MIT Computer Science and Artificial Intelligence Laboratory.
Human-centered Interfaces
EE 492 ENGINEERING PROJECT
Presentation transcript:

Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab

MIT Project Oxygen A multi-laboratory effort at MIT to develop pervasive, human-centric computing Enabling people “to do more by doing less,” that is, to accomplish more with less work Bringing abundant computation and communication as pervasive as free air, naturally into people’s lives

Human-centered Interfaces Free users from desktop and wired interfaces Allow natural gesture and speech commands Give computers awareness of users Work in open and noisy environments -Outdoors -- PDA next to construction site! -Indoors -- crowded meeting room Vision’s role: provide perceptive context

Perceptive Context Who is there? (presence, identity) What is going on? (activity) Where are they? (individual location) Which person said that? (audiovisual grouping) What are they looking / pointing at? (pose, gaze)

Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction

Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction

Person Identification at a distance Multiple cameras Face and gait cues Approach: canonical frame for each modality by placing the virtual camera at a desired viewpoint Face: frontal view, fixed scale Gait: profile silhouette Need to place virtual camera -explicit model estimation -search -motion-based heuristic  trajectory We combine trajectory estimate and limited search

Virtual views Frontal Profile silhouette: Face: Input

Examples: VH-generated views Faces: Gait:

Effects of view-normalization

Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction

Range-based stereo person tracking Range can be insensitive to fast illumination change Compare range values to known background Project into 2D overhead view IntensityRangeForeground Plan view Merge data from multiple stereo cameras.. Group into trajectories… Examine height for sitting/standing…

Visibility Constraints for Virtual Backgrounds virtual background for C 1

Virtual Background Segmentation Sparse Background New Image Detected Foreground! Second View Virtual Background for first view Detected Foreground!

Points -> trajectories -> active sensing Active Camera motion Microphone array Activity classification trajectories Spatio- temporal points

Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction

Audio input in noisy environments Acquire high-quality audio from untethered, moving speakers “Virtual” headset microphones for all users

Vision guided microphone array Cameras Microphones

System flow (single target) Vision-based tracker Gradient ascent search in array output power Delay-and-sum beamformer Video Streams Audio Streams

Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction

Audio-visual Analysis Multi-modal approach to source separation Exploit joint statistics of image and audio signal Use non-parametric density estimation Audio-based image localization Image-based audio localization A/V Verification: is this audio and video from the same person?

Audio-visual synchrony detection

Audio weighting from video (detected face) + AVMI Applications Image localization from audio Audio associated with left face Audio associated with right face New: Synchronization Detection! image variance AVMI

Audio-visual synchrony detection MI: Compute confusion matrix for 8 subjects: No errors! No training! Also can use for audio/visual temporal alignment….

Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction

Face pose estimation rigid motion estimation with long-term drift reduction

Brightness and depth motion constraints I t I t + 1 II ZZ Z t Z t + 1 y t =  y t-1 Parameter space

New bounded error tracking algorithm Influence region open loop 2D tracker closed loop 2D tracker Track relative to all previous frames which are close in pose space

Closed-loop 3D tracker Track users head gaze for hands-free pointing…

Head-driven cursor Related Projects: Schiele Kjeldsen Toyama Current application for second pointer or scrolling / focus of attention…

Head-driven cursor MethodAvg. error. (pixels) Cylindrical head tracker 25 2D Optical Flow head tracker22.9 Hybrid 30 3D head tracker (ours)7.5 Eye gaze 27 Trackball3.7 Mouse1.9

Gaze aware interface Drowsy driver detection: head nod and eye-blink… Interface Agent responds to gaze of user -agent should know when it’s being attended to -turn-taking pragmatics -anaphora / object reference First prototype -E21 interface “sam” -current experiments with face tracker on meeting room table Integrating with wall cameras and hand gesture interfaces…

“Look-to-talk” Subject not looking at SAM ASR turned off Subject looking at SAM ASR turned on

Vision Interface Group Projects Person Identification at a distance from multiple cameras and multiple cues (face, gait) Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues Vision guided microphone array Joint statistical models for audiovisual fusion Face pose estimation: rigid motion estimation with long- term drift reduction Conclusion and contact info.

Conclusion: Perceptual Context Take-home message: vision provides Perceptual Context to make applications aware of users.. activity -- adapting outdoor activity classification [ Grimson and Stauffer ] to indoor domain… So far: detection, ID, head pose, audio enhancement and synchrony verification… Soon: gaze -- add eye tracking on pose stabilized face pointing -- arm gestures for selection and navigation.

Contact Prof. Trevor Darrell Person Identification at a distance from multiple cameras and multiple cues (face, gait) -Greg Shakhnarovich Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues -Neal Checka, Leonid Taycher, David Demirdjian Vision guided microphone array -Kevin Wilson Joint statistical models for audiovisual fusion -John Fisher Face pose estimation: rigid motion estimation with long-term drift reduction -Louis Morency, Alice Oh, Kristen Grauman