Paul Fitzpatrick lbr-vision – Face to Face – Robot vision in social settings.

Slides:



Advertisements
Similar presentations
Human-Computer Interaction
Advertisements

ARTIFICIAL PASSENGER.
Announcements HW 6: Written (not programming) assignment. –Assigned today; Due Friday, Dec. 9. to me. Quiz 4 : OPTIONAL: Take home quiz, open book.
Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,
Introduction to Eye Tracking
A Neural Model for Detecting and Labeling Motion Patterns in Image Sequences Marc Pomplun 1 Julio Martinez-Trujillo 2 Yueju Liu 2 Evgueni Simine 2 John.
Human (ERP and imaging) and monkey (cell recording) data together 1. Modality specific extrastriate cortex is modulated by attention (V4, IT, MT). 2. V1.
Perception and Perspective in Robotics Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory Humanoid Robotics Group Goal To build.
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image Where does the.
The Free Safety Problem Using Gaze Estimation as a Meaningful Input to a Homing Task Albert Goldfain CSE 668: Animate Vision Principles Final Project Presentation.
Angles & Motion Tips for shooting video projects..
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
Exchanging Faces in Images SIGGRAPH ’04 Blanz V., Scherbaum K., Vetter T., Seidel HP. Speaker: Alvin Date: 21 July 2004.
ECE 7340: Building Intelligent Robots QUALITATIVE NAVIGATION FOR MOBILE ROBOTS Tod S. Levitt Daryl T. Lawton Presented by: Aniket Samant.
Electro-Oculography (EOG) Measurement System The goal : To measure eye movement with maximum accuracy using skin electrodes around the eyes that detect.
Jeff B. Pelz, Roxanne Canosa, Jason Babcock, & Eric Knappenberger Visual Perception Laboratory Carlson Center for Imaging Science Rochester Institute of.
Visual Attention More information in visual field than we can process at a given moment Solutions Shifts of Visual Attention related to eye movements Some.
3D Measurements by PIV  PIV is 2D measurement 2 velocity components: out-of-plane velocity is lost; 2D plane: unable to get velocity in a 3D volume. 
Human-robot interaction Michal de Vries. Humanoid robots as cooperative partners for people Breazeal, Brooks, Gray, Hoffman, Kidd, Lee, Lieberman, Lockerd.
Saccades: Rapid rotation of the eyes to reorient gaze to a new object or region. Reflex saccades Attentional saccades Shifting Gaze - Saccades.
Jeff B. Pelz, Roxanne Canosa, Jason Babcock, & Eric Knappenberger Visual Perception Laboratory Carlson Center for Imaging Science Rochester Institute of.
December 2, 2014Computer Vision Lecture 21: Image Understanding 1 Today’s topic is.. Image Understanding.
Active Visual Observer Integration of Visual Processes for Control of Fixation KTH (Royal Institute of Technology, Stockholm)and Aalborg University C.S.
Sociable Machines Cynthia Breazeal MIT Media Lab Robotic Presence Group.
Computer Vision Systems for the Blind and Visually Disabled. STATS 19 SEM Talk 3. Alan Yuille. UCLA. Dept. Statistics and Psychology.
Eye Movements and Visual Attention
1 Motivation Video Communication over Heterogeneous Networks –Diverse client devices –Various network connection bandwidths Limitations of Scalable Video.
Film Study Notes. Cinematography What is the narrative of the film?  Does the narrative have a clear beginning, middle and end?  What is the goal of.
Humanoid Robots Debzani Deb.
Metta, Sandini, Natale & Panerai.  Developmental principles based on biological systems.  Time-variant machine learning.  Focus on humanoid robots.
Studying Visual Attention with the Visual Search Paradigm Marc Pomplun Department of Computer Science University of Massachusetts at Boston
Human-Computer Interaction Human-Computer Interaction Tracking Hanyang University Jong-Il Park.
Active Vision Key points: Acting to obtain information Eye movements Depth from motion parallax Extracting motion information from a spatio-temporal pattern.
VISION AND VISUAL PERCEPTION The visual system is made up of: the eyes, visual cortex and visual association cortex Each eye is set into protective cavities.
Navi Rutgers University 2012 Design Presentation
1 Perception and VR MONT 104S, Fall 2008 Session 13 Visual Attention.
Cynthia Breazeal Aaron Edsinger Paul Fitzpatrick Brian Scassellati MIT AI Lab Social Constraints on Animate Vision.
December 4, 2014Computer Vision Lecture 22: Depth 1 Stereo Vision Comparing the similar triangles PMC l and p l LC l, we get: Similarly, for PNC r and.
Vision: change, lightness, acuity
Stereo Many slides adapted from Steve Seitz.
Lecture 15 – Social ‘Robots’. Lecture outline This week Selecting interfaces for robots. Personal robotics Chatbots AIML.
CS332 Visual Processing Department of Computer Science Wellesley College Binocular Stereo Vision Region-based stereo matching algorithms Properties of.
Natural Tasking of Robots Based on Human Interaction Cues Brian Scassellati, Bryan Adams, Aaron Edsinger, Matthew Marjanovic MIT Artificial Intelligence.
1 Artificial Intelligence: Vision Stages of analysis Low level vision Surfaces and distance Object Matching.
CSE 185 Introduction to Computer Vision Stereo. Taken at the same time or sequential in time stereo vision structure from motion optical flow Multiple.
Visually guided attention during flying OR Pilots “do not like” fovea because they cannot pay attention to more than 1% of space at any one time.
Visual Thinking and the Objects of Visual Attention Colin Ware University of New Hampshire.
Chapter 5 Multi-Cue 3D Model- Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical.
The geometry of the system consisting of the hyperbolic mirror and the CCD camera is shown to the right. The points on the mirror surface can be expressed.
The Next Generation of Robots?
Computer Science Readings: Reinforcement Learning Presentation by: Arif OZGELEN.
Visual Computing Computer Vision 2 INFO410 & INFO350 S2 2015
Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.
MIT Artificial Intelligence Laboratory — Research Directions The Next Generation of Robots? Rodney Brooks.
Paul Fitzpatrick Human – Robot Communication.  Motivation for communication  Human-readable actions  Reading human actions  Conclusions Human – Robot.
DARPA Mobile Autonomous Robot Software BAA99-09 July 1999 Natural Tasking of Robots Based on Human Interaction Cues Cynthia Breazeal Rodney Brooks Brian.
Glossary of Camera Shots. A. Types of Shots B. Camera Angles C. Camera Movement D. Duration of Shots.
1 Computational Vision CSCI 363, Fall 2012 Lecture 32 Biological Heading, Color.
Recognition and Expression of Emotions by a Symbiotic Android Head Daniele Mazzei, Abolfazl Zaraki, Nicole Lazzeri and Danilo De Rossi Presentation by:
AN ACTIVE VISION APPROACH TO OBJECT SEGMENTATION – Paul Fitzpatrick – MIT CSAIL.
Emotion and Sociable Humanoid Robots (Cynthia Breazeal) Yumeng Liao.
CS4670 / 5670: Computer Vision Kavita Bala Lec 27: Stereo.
Manipulation in Human Environments
Life in the Humanoid Robotics Group MIT AI Lab
Head pose estimation without manual initialization
Common Classification Tasks
Minimally Responsive Child
Binocular Stereo Vision
Peeping into the Human World
Stereo vision Many slides adapted from Steve Seitz.
Presentation transcript:

Paul Fitzpatrick lbr-vision – Face to Face – Robot vision in social settings

Paul Fitzpatrick lbr-vision Robot vision in social settings  Humans (and robots) recover information about objects from the light they reflect  Human head and eye movements give clues to attention and motivation  In a social context, people constantly read each other’s actions for these clues  Anthropomorphic robots can partake in this implicit communication, giving smooth and intuitive interaction

Paul Fitzpatrick lbr-vision Humanoid robotics at MIT

Paul Fitzpatrick lbr-vision Eye tilt Left eye panRight eye pan Camera with wide field of view Camera with narrow field of view Kismet

Paul Fitzpatrick lbr-vision Kismet  Built by Cynthia Breazeal to explore expressive social exchange between humans and robots –Facial and vocal expression –Vision-mediated interaction (collaboration with Brian Scassellati, Paul Fitzpatrick) –Auditory-mediated interaction (collaboration with Lijin Aryananda)

Paul Fitzpatrick lbr-vision Vision-mediated interaction  Visual attention –Driven by need for high-resolution view of a particular object, for example, to find eyes on a face –Marks the object around which behavior is organized –Manipulating attention is a powerful way to influence behavior  Pattern of eye/head movement –Gives insight into level of engagement

Paul Fitzpatrick lbr-vision Expressing visual attention Attention can be deduced from behavior Or can be expressed more directly

Paul Fitzpatrick lbr-vision Building an attention system Current input Pre-attentive filters Saliency mapTracker Eye movement Behavior system Modulatory influence Primary information flow

Paul Fitzpatrick lbr-vision Skin tone ColorMotionHabituation Weighted by behavioral relevance Pre-attentive filters Initiating visual attention Current input Saliency map

Paul Fitzpatrick lbr-vision Example filter – skin tone

Paul Fitzpatrick lbr-vision  Image pixel p(r,g,b) is NOT considered skin tone if: r < 1.1  g(red component fails to dominate green sufficiently) r < 0.9  b (red component is excessively dominated by blue) r > 2.0  max(g,b) (red component completely dominates) r < 20(red component too low to give good estimate of ratios) r > 250(too saturated to give good estimate of ratios)  Lots of things that are not skin pass these tests  But lots and lots of things that are not skin fail Skin tone filter – details

Paul Fitzpatrick lbr-vision high skin gain, low color saliency gain Looking time – 86% face, 14% block Looking time – 28% face, 72% block low skin gain, high color saliency gain Modulating visual attention

Paul Fitzpatrick lbr-vision Manipulating visual attention

Paul Fitzpatrick lbr-vision slipped……recovered Maintaining visual attention

Paul Fitzpatrick lbr-vision Persistence of attention  Want attention to be responsive to changing environment  Want attention to be persistent enough to permit coherent behavior  Trade-off between persistence and responsiveness needs to be dynamic

Paul Fitzpatrick lbr-vision Influences on attention Current input Pre-attentive filters Saliency mapTracker Eye movement Behavior system Modulatory influence Primary information flow

Paul Fitzpatrick lbr-vision Vergence angle Left eye Right eye Ballistic saccade to new target Smooth pursuit and vergence co-operate to track object Eye movement

Paul Fitzpatrick lbr-vision Eye/neck motor control  Neck movements combine :- –Attention-driven orientation shifts –Affect-driven postural shifts –Fixed action patterns  Eye movements combine :- –Attention-driven orientation shifts –Turn-taking cues

Paul Fitzpatrick lbr-vision Social amplification of motor acts  Active vision involves choosing a robot’s pose to facilitate visual perception.  Focus has been on immediate physical consequences of pose.  For anthropomorphic head, active vision strategies can be “read” by a human, assigned an intent which may then be completed beyond the robot’s immediate physical capabilities.  Robot’s pose has communicative value, to which human responds.

Comfortable interaction distance Too close – withdrawal response Too far – calling behavior Person draws closer Person backs off Comfortable interaction speed Too fast – irritation response Too fast, Too close – threat response Beyond sensor range

Paul Fitzpatrick lbr-vision Video: Withdrawal response

Paul Fitzpatrick lbr-vision Eye tilt Left eye panRight eye pan Camera with wide field of view Camera with narrow field of view Kismet’s cameras

Paul Fitzpatrick lbr-vision Simplest camera configuration  Single camera –Multiple camera systems require careful calibration for cross- camera correspondence  Wide field of view –Don’t know where to look beforehand  Moving infrequently relative to the rate of visual processing –Ego-motion complicates visual processing

Skin Detector Color Detector Motion Detector Face Detector Wide Frame Grabber Motion Control Daemon Right Frame Grabber Left Frame Grabber Right Foveal Camera Wide Camera Left Foveal Camera Eye-Neck Motors WWWW Attention Wide Tracker Foveal Disparity Smooth Pursuit & Vergence w/ neck comp. VOR Saccade w/ neck comp. Fixed Action Pattern Affective Postural Shifts w/ gaze comp. Arbitor Eye-Head-Neck Control Disparity Ballistic movement Locus of attention Behaviors Motivations , , ... , d  d  2 sss 2 ppp 2 fff 2 vvv Wide Camera 2 Tracked target Salient target Eye finder Left Frame Grabber Distance to target

Paul Fitzpatrick lbr-vision Missing components  High acuity vision – for example, to find eyes within a face –Need cameras that sample a narrow field of view at high resolution  Binocular view, for stereoscopic vision –Need paired cameras –May need wide or narrow fields of view, depending on application

Paul Fitzpatrick lbr-vision Missing: high acuity vision  Typical visual tasks require both high acuity and a wide field of view  High acuity is needed for recognition tasks and for controlling precise visually guided motor movements  A wide field of view is needed for search tasks, for tracking multiple objects, compensating for involuntary ego-motion, etc.

Paul Fitzpatrick lbr-vision Biological solution  A common trade-off found in biological systems is to sample part of the visual field at a high enough resolution to support the first set of tasks, and to sample the rest of the field at an adequate level to support the second set.  This is seen in animals with foveate vision, such as humans, where the density of photoreceptors is highest at the center and falls off dramatically towards the periphery.

Paul Fitzpatrick lbr-vision Simulated example  Compare size of eyes and ears in transformed image – eyes are closer to center, and so are better represented

Foveated vision (From C. Graham, “Vision and Visual Perception”)

Paul Fitzpatrick lbr-vision Mechanical approximations  Imaging surface with varying sensor density (Sandini et al)  Distorting lens projecting onto conventional imaging surface (Kuniyoshi et al)  Multi-camera arrangements (Scassellati et al)  Cameras with zoom control directly trade-off acuity with field of view (but can’t have both)  Or do something completely different!

Paul Fitzpatrick lbr-vision Multi-camera arrangement Wide view camera gives context used to select region at which to point narrow view camera If target is close and moving, must respond quickly and accurately, or won’t gain any information at all Wide field of view, low acuity Narrow field of view, high acuity

Paul Fitzpatrick lbr-vision Mixing fields of view  Small distance between cameras with wide and narrow field of views, simplifying mapping between the two  Central location of wide camera allows head to be oriented accurately independently of the distance to an object  Allows coarse open-loop control of eye direction from wide camera – improves gaze stability Eye tilt Left eye panRight eye pan Fixed with respect to head

Paul Fitzpatrick lbr-vision Wide View camera Narrow view camera Object of interest Field of view Rotate camera New field of view Tip-toeing around 3D

Paul Fitzpatrick lbr-vision Using 3D Eye tilt Left eye panRight eye pan Fixed with respect to head

Skin Detector Color Detector Motion Detector Face Detector Wide Frame Grabber Motion Control Daemon Right Frame Grabber Left Frame Grabber Right Foveal Camera Wide Camera Left Foveal Camera Eye-Neck Motors WWWW Attention Wide Tracker Foveal Disparity Smooth Pursuit & Vergence w/ neck comp. VOR Saccade w/ neck comp. Fixed Action Pattern Affective Postural Shifts w/ gaze comp. Arbitor Eye-Head-Neck Control Disparity Ballistic movement Locus of attention Behaviors Motivations , , ... , d  d  2 sss 2 ppp 2 fff 2 vvv Wide Camera 2 Tracked target Salient target Eye finder Left Frame Grabber Distance to target

Paul Fitzpatrick lbr-vision Kismet’s little secret

NT speech synthesis affect recognition Linux Speech recognition Face Control Emotion Percept & Motor Drives & Behavior L Tracker Attent. system Dist. to target Motion filter Eye finder Motor ctrl audio speech comms Skin filter Color filter QNX CORBA sockets, CORBA dual-port RAM Cameras Eye, neck, jaw motors Ear, eyebrow, eyelid, lip motors Microphone Speakers

Paul Fitzpatrick lbr-vision Robots looking at humans  Responsiveness to the human face is vital for a robot to partake in natural social exchange  Need to locate and track facial features, and recover their semantic content

Paul Fitzpatrick lbr-vision HairForehead Eye/brow Cheeks/nose Mouth/jaw TOP BOTTOM LEFT Hair SkinEyeBridge Hair SkinEyeRIGHT Modeling the face  Match oriented regions on face against vertical model to isolate eye/brow region  Match eye/brow region against horizontal model to find eyes, bridge  Each model scans one spatial dimension, so can formulate as HMM, allowing fast optimization of match Vertical face model Horizontal eye/brow model

Paul Fitzpatrick lbr-vision Robots looking at robots  It is useful to link the robot’s representation of its own face with that of humans.  Bonus: Allows robot-robot interaction via human protocol.

Paul Fitzpatrick lbr-vision Conclusion  Vision community working on improving machine perception of human  But equally important to consider human perception of machine  Robot’s point of view must be clear to human, so they can communicate effectively – and quickly!

Paul Fitzpatrick lbr-vision Video: Turn taking

Paul Fitzpatrick lbr-vision Video: Affective intent