Natural Tasking of Robots Based on Human Interaction Cues Brian Scassellati, Bryan Adams, Aaron Edsinger, Matthew Marjanovic MIT Artificial Intelligence.

Slides:

Advertisements

Similar presentations

Evidential modeling for pose estimation Fabio Cuzzolin, Ruggero Frezza Computer Science Department UCLA.

Advertisements

Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,

ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS

CSCTR Session 11 Dana Retová.  Start bottom-up  Create cognition based on sensori-motor interaction ◦ Cohen et al. (1996) – Building a baby ◦ Cohen.

By: Ryan Wendel.  It is an ongoing analysis in which videos are analyzed frame by frame  Most of the video recognition is pulled from 3-D graphic engines.

Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.

Human (ERP and imaging) and monkey (cell recording) data together 1. Modality specific extrastriate cortex is modulated by attention (V4, IT, MT). 2. V1.

Perception and Perspective in Robotics Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory Humanoid Robotics Group Goal To build.

Paul Fitzpatrick lbr-vision – Face to Face – Robot vision in social settings.

 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.

Yiannis Demiris and Anthony Dearden By James Gilbert.

CS 561, Sessions 27 1 Towards intelligent machines Thanks to CSCI561, we now know how to… - Search (and play games) - Build a knowledge base using FOL.

Experiences with an Architecture for Intelligent Reactive Agents By R. Peter Bonasso, R. James Firby, Erann Gat, David Kortenkamp, David P Miller, Marc.

Animat Vision: Active Vision in Artificial Animals by Demetri Terzopoulos and Tamer F. Rabie.

Visual Attention More information in visual field than we can process at a given moment Solutions Shifts of Visual Attention related to eye movements Some.

Human-robot interaction Michal de Vries. Humanoid robots as cooperative partners for people Breazeal, Brooks, Gray, Hoffman, Kidd, Lee, Lieberman, Lockerd.

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Intent Recognition as a Basis for Imitation.

Sensory-Motor Primitives as a Basis for Imitation: Linking Perception to Action and Biology to Robotics Presentation by Dan Hartmann 21 Feb 2006.

Video Mining Learning Patterns of Behaviour via an Intelligent Image Analysis System.

Precursors to theory of mind? Deciding whether something is animate or inanimate Potential Cues to animacy –Action at a distance –Self-propelled –Biological.

A Vision-Based System that Detects the Act of Smoking a Cigarette Xiaoran Zheng, University of Nevada-Reno, Dept. of Computer Science Dr. Mubarak Shah,

 For many years human being has been trying to recreate the complex mechanisms that human body forms & to copy or imitate human systems  As a result.

Sociable Machines Cynthia Breazeal MIT Media Lab Robotic Presence Group.

Biointelligence Laboratory School of Computer Science and Engineering Seoul National University Cognitive Robots © 2014, SNU CSE Biointelligence Lab.,

CIS 601 Fall 2004 Introduction to Computer Vision and Intelligent Systems Longin Jan Latecki Parts are based on lectures of Rolf Lakaemper and David Young.

Humanoid Robots Debzani Deb.

Perception Illusion A false representation of the environment

Studying Visual Attention with the Visual Search Paradigm Marc Pomplun Department of Computer Science University of Massachusetts at Boston

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Using Expectations to Drive Cognitive Behavior Unmesh Kurup, Christian Lebiere,

MIND: The Cognitive Side of Mind and Brain  “… the mind is not the brain, but what the brain does…” (Pinker, 1997)

Multimedia Specification Design and Production 2013 / Semester 2 / week 8 Lecturer: Dr. Nikos Gazepidis

Speech Recognition Robot

Body Expression of Emotion (BEE)

Active Vision Key points: Acting to obtain information Eye movements Depth from motion parallax Extracting motion information from a spatio-temporal pattern.

REU Project RGBD gesture recognition with the Microsoft Kinect Steven Hickson.

MODULE 23 COGNITION/THINKING. THINKING Thinking is a cognitive process in which the brain uses information from the senses, emotions, and memory to create.

Chapter 7. BEAT: the Behavior Expression Animation Toolkit

CIS 601 Fall 2003 Introduction to Computer Vision Longin Jan Latecki Based on the lectures of Rolf Lakaemper and David Young.

Cynthia Breazeal Aaron Edsinger Paul Fitzpatrick Brian Scassellati MIT AI Lab Social Constraints on Animate Vision.

Beyond Gazing, Pointing, and Reaching A Survey of Developmental Robotics Authors: Max Lungarella, Giorgio Metta.

“Low Level” Intelligence for “Low Level” Character Animation Damián Isla Bungie Studios Microsoft Corp. Bruce Blumberg Synthetic Characters MIT Media Lab.

Visual Perception, Attention & Action. Anthony J Greene2.

Perceptual Development

Strategies for Increasing Communication in Natural Environments.

卓越發展延續計畫分項三 User-Centric Interactive Media ~ 主持人 : 傅立成共同主持人 : 李琳山，歐陽明，洪一平，陳祝嵩水美溫泉會館研討會

ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.

Model of the Human  Name Stan  Emotion Happy  Command Watch me  Face Location (x,y,z) = (122, 34, 205)  Hand Locations (x,y,z) = (85, -10, 175) (x,y,z)

DARPA ITO/MARS Project Update Vanderbilt University A Software Architecture and Tools for Autonomous Robots that Learn on Mission K. Kawamura, M. Wilkes,

Toward a Unified Scripting Language 1 Toward a Unified Scripting Language : Lessons Learned from Developing CML and AML Soft computing Laboratory Yonsei.

1 Artificial Intelligence: Vision Stages of analysis Low level vision Surfaces and distance Object Matching.

Object Lesson: Discovering and Learning to Recognize Objects Object Lesson: Discovering and Learning to Recognize Objects – Paul Fitzpatrick – MIT CSAIL.

The Next Generation of Robots?

MIT Artificial Intelligence Laboratory — Research Directions The Next Generation of Robots? Rodney Brooks.

Give examples of the way that virtual reality can be used in Psychology.

DARPA Mobile Autonomous Robot Software BAA99-09 July 1999 Natural Tasking of Robots Based on Human Interaction Cues Cynthia Breazeal Rodney Brooks Brian.

Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab.

Feel the beat: using cross-modal rhythm to integrate perception of objects, others, and self Paul Fitzpatrick and Artur M. Arsenio CSAIL, MIT.

WP6 Emotion in Interaction Embodied Conversational Agents WP6 core task: describe an interactive ECA system with capabilities beyond those of present day.

Simulation of Characters in Entertainment Virtual Reality.

Visual Recognition of Human Movement Style Frank E. Pollick Department of Psychology University of Glasgow.

MIT Artificial Intelligence Laboratory — Research Directions Intelligent Perceptual Interfaces Trevor Darrell Eric Grimson.

Functionality of objects through observation and Interaction Ruzena Bajcsy based on Luca Bogoni’s Ph.D thesis April 2016.

Manipulation in Human Environments

Nicole looks at faces Development of a visual robot interface to interpret facial expressions NICOLE: Future robots will share their environment with humans.

San Diego May 22, 2013 Giovanni Saponaro Giampiero Salvi

Manipulation in Human Environments

Life in the Humanoid Robotics Group MIT AI Lab

Learning about Objects

Learning complex visual concepts

Artificial Intelligence Chapter 25. Agent Architectures

Presentation transcript:

Natural Tasking of Robots Based on Human Interaction Cues Brian Scassellati, Bryan Adams, Aaron Edsinger, Matthew Marjanovic MIT Artificial Intelligence Laboratory Locate target Foveate Target Apply Face Filter Software Zoom Feature Extraction 300 msec 66 msec Current Research: Joint Reference and Simple Mimicry Our team at the MIT Artificial Intelligence lab is building robotic systems that use natural social conventions as an interface. We believe that these systems will enable anyone to teach the robot to perform simple tasks. The robot will be usable without special training or programming skills, and will be able to act in unique and dynamic situations. We originally outlined a sequence of behavioral tasks, listed on the chart below, that will allow our robots to learn new tasks from a human instructor. In the chart below, behaviors in bold text have been completed, behaviors in italic text have been partially implemented. Schema Creation Development of Social Interaction Development of Commonsense Knowledge Development of Sequencing Development of Coordinated Body Actions Face Finding Eye Contact Gaze Direction Gaze Following Recognizing Pointing Speech Prosody Intentionality Detector Recognizing Instructor’s Knowledge States Recognizing Beliefs, Desires, and Intentions Arm and Face Gesture Recognition Facial Expression Recognition Familiar Face Recognition Directing Instructor’s Attention Vocal Cue Production Motion Detector Depth Perception Object Saliency Object Segmentation Object Permanence Body Part Segmentation Human Motion Models Long-Term Knowledge Consolidation Expectation-Based Representations Attention System Turn Taking Task-Based Guided Perception Action Sequencing VOR/ OKR Kinesthetic Body Representation Mapping Robot Body to Human Body Self-Motion Models Reaching Around Obstacles Object Manipulation Active Object Exploration Tool Use Robot Teaching Line-of-Sight Reaching Simple Grasping Smooth Pursuit and Vergence Multi-Axis Orientation Social Script Sequencing Instructional Sequencing Goals Future Research Our current research focuses on building the perceptual and motor primitives that will allow the robot to detect and respond to natural social cues. In the past year, we have developed systems that respond to human attention states and that mimic the movement of any animate object by tracing a similar trajectory with the robot’s arm. Gaze Direction Visual Input Animate Objects Face/Eye Finder ToBY Visual Attention Trajectory Formation ffff Pre-attentive filters Arm Primitives Reaching / Pointing Trajectories are selected based on the inherent object saliency, the instructor’s attentional state, and the animacy judgment. These trajectories are mapped from visual coordinates to a set of primitive arm postures. The trajectory can then be used to allow the robot to perform object-centered actions (such as pointing) or process-centered actions (such as repeating the trajectory with its own arm). The attention of the instructor is monitored by a system that finds faces (using a color filter and shape metrics), orients to the instructor, and extracts salient features at a distance of 20 feet. Moving handRolling chair“Animate” chair The “theory of body” module (ToBY) is a set of agents, each of which incorporates a rule of naïve physics. These rules estimate how objects move under natural conditions. In the images Feature Extraction Generate k-best Hypotheses Management (pruning, merging) Delay Generate Predictions Matching  Attention Activation w Saturation w Motion w Habituation w Skin The system operates in a sequence of stages: Visual input is filtered pre-attentively. An attention mechanism selects salient targets in each image frame. Targets are linked together into trajectories by a motion correspondence procedure. The “theory of body” module (ToBY) looks for objects that are self-propelled (animate). Faces are located in animate stimuli. Features such as the eyes and mouth are extracted to provide head orientation. Animate visual trajectories are mapped to arm movements. The attention system produces a set of target points for each frame in the image sequence. These points are connected across time by the multi-hypothesis tracking algorithm developed by Cox and Hingorani. The system maintains multiple hypothesis for each possible trajectory, which allows for ambiguous data to be resolved by further information. Visual input is processed by a set of parallel pre-attentive filters including skin tone, color saturation, motion, and disparity filters. The attention system combines the filtered images using weights that are influenced by high-level task constraints. The attention system also incorporates a habituation mechanism and biases the robot’s attention based on the attention of the instructor. shown above, trajectories that obey these rules are judged to be inanimate (shown in red), while those that display self-propelled movement (like the moving hand or the “animate” chair being pushed with a rod) are judged animate (green). More Complex Mimicry One future direction for our work is to look at more complex forms of social learning. We will both explore a wider range of tasks and ways to sequence together learned actions into more complex behaviors, and we will work on building systems that imitate, that is, they follow the intent of the action, not the form of the action. Understanding Self We will also exploring ideas about how to build representations of the robot’s own body, and the actions that it is capable of performing. The robot should recognize it’s own arm as it moves through the world, and even be able to recognize it’s own movements in a mirror by the temporal correlation. New Head and Hands New Hands