Presentation on theme: "Alan C. Schultz J. Gregory Trafton Nick Cassimatis"— Presentation transcript:
1Using Computational Cognitive Models for Better Human-Robot Collaboration Alan C. SchultzJ. Gregory TraftonNick CassimatisNavy Center for Applied Research in Artificial IntelligenceNaval Research Laboratory
2Peer-to-peer collaboration in Human-Robot Teams Not interested in general, unified grand theory of cognition for solving the whole problemWe already know how to be mobile, avoid collisions,etcApproach: to be informed from cognitive psychologyStudy human-human collaborationDetermine important high-level cognitive skillsBuild computational cognitive models of these skillsACT-R, SOAR, EPIC, Polyscheme…Use computational models as reasoning mechanism on robot for high-level cognition
3Cognitive Science as Enabler Cognitive Robotics Hypothesis:A system using human-like representations and processes will enable better collaboration with people than a computational system that does notSimilar representations and reasoning mechanisms make it easier for humans to work with the system; more compatibleFor close collaboration, systems should act “naturally”i.e. not do something or say something in a way that detracts from the interaction/collaboration with the humanRobot should accommodate humans; not other way aroundSolving tasks from “first principles”Humans are good at solving some tasks; let’s leverage human’s ability
4Cognitive Skills Appropriate knowledge representations Problem solving Spatial representation for spatial reasoningAdapting representation to problem solving methodProblem solvingNavigation routing with constraints (e.g., remaining hidden)LearningLearning to recognize and anticipate others’ behaviorsLearning characteristics of other’s capabilitiesVisionObject permanence and tracking (Cassimatis et al., 04)Recognizing gesturesNatural language/gestures (Perzanowski et al., 01)
5Cognitive Skills Perspective-Taking Spatial reasoning Spatial (Trafton et al., 2005)Social (Breazeal et al., 2006)Spatial reasoningPeople use metric information implicitly; use and think qualitatively much more frequently (Trafton et al., 2006)Spatial referencing/language (Skubic et al., 04)Temporal reasoningPredicting how long something will takeAnticipationWhat does a person need and why?
6Hide and Seek (Trafton & Schultz, 2004, 2006) Lots of knowledge about space requiredA “good” hider needs visual, spatial perspective taking to find the good hiding places (large amount of spatial knowledge needed)
7Development of Perspective-Taking Children start developing (very very basic) perspective-taking ability around age 3-4Huttenlocher & Presson, 1979; Newcombe & Huttenlocher, 1992; Wallace, Alan, & Tribol, 2001In general, 3-4 year old children do not have a particularly well developed sense of perspective taking
8Case Study: Hide and Seek Age 3½ Elena did not have perspective taking abilityLeft/right errorsplay hide and seek by learning pertinent qualitative features of objectsconstruct knowledge about hiding that is object-specific
9Hide and Seek Cognitive Model Created cognitive model of Elena learning to play hide and seek using ACT-R (Anderson, et al 93, 95, 98, 05)Correctly models Elena’s behavior at 3½ years of ageLearns and refines hiding behavior based on interactions with “teacher”Learns production strength based on success and failure of hiding behaviorLearns ontological or schematic knowledge about hidingIts bad to hide behind something that’s clearIts good to hide behind something that is big enoughKnows about location of objects (relative) (behind, in front of) adds knowledge about relationships. Model only has syntactic notion of spatial relationships
10Hybrid Cognitive/Reactive Architecture Robot Hide and Seek Computational cognitive model of hiding makes deliberative (high-level cognitive) decisions. Models learning.Using cognitive model of hiding (after learning) in order to reason about what makes a good hiding place in order to seek.Reactive layer of hybrid model for mobility and sensor processing
11How important is perspective taking? (Trafton et al., 2005) Analyzed a corpus of NASA training tapesSpace Station Mission 9ATwo astronauts working in full suits in neutral-buoyancy facility. Third, remote person participates.Standard protocol analysis techniques; transcribed 8 hours of utterances and gestures (~4000 instances)Use of spatial language (up, down, forward, in between, my left, etc) and commandsResearch questions:What frames of reference are used?How often do people switch frames of reference?How often do people take another person’s perspective?
12Spatial language in space Results Frame of ReferenceExample% UtterancesExocentricGo straight zenith (“up”)7%EgocentricTurn to my left15%Addressee-CenteredTurn to your left10%DeicticPut it over there [Points]5%Object-centeredPut it on top of the box63%How frequently do people switch their frame of reference?45% of the time (Consistent with Franklin, Tversky, & Coon, 1992)How often do people take other people’s perspective (or force others to take theirs)?25% of the time
13Perspective Taking and Changing Frames of Reference
14Perspective Taking and Changing Frames of Reference Bob, if you come straight down from where you are, uh, and uh kind of peek down under the rail on the nadir side, by your right hand, almost straight nadir, you should see the uh…Notice the mixing of perspectives: exocentric (down), object-centered (down under the rail), addressee-centered (right hand), and exocentric again (nadir) all in one instruction!Notice the “new” term developed collaboratively: mystery hand rail
15“Please hand me the wrench” Perspective TakingPerspective taking is critical for collaboration.How do we model it? (ACT-R, Polyscheme…)I’ll show several demos that show our current progress on spatial perspective takingBut first a scenario:“Please hand me the wrench”
16Perspective taking in human interactions How do people usually resolve ambiguous references that involve different spatial perspectives? (Clark, 96)Principle of least effort (which implies least joint effort)All things being equal, agents try to minimize their effortPrinciple of joint salienceThe ideal solution to a coordination problem among two or more agents is the solution that is the most salient, prominent, or conspicuous with respect to their current common ground.In less simple contexts, agents may have to work harder to resolve ambiguous references
17Perspective Taking: A tale of two systems ACT-R/S (Schunn & Harrison, 2001)Our perspective-taking system using ACT-R/S is described in Hiatt et al. 2003Three Integrated VisuoSpatial buffersFocal: Object ID; non-metric geon partsManipulative: grasping/tracking; metric geonsConfigural: navigation; bounding boxesPolyscheme (Cassimatis)Computational Cognitive Architecture where:Mental Simulation is the primitiveMany AI methods are integratedOur perspective-taking using Polyscheme is described in Trafton et al., 2005
18Robot Perspective Taking Human can see one cone Robot can sense two cones(Fong et al., 06)give info at beginning of what system/robot is doing
19SummaryHaving similar or compatible representation and reasoning as a human facilitates human-robot collaborationWe’ve developed computational cognitive models of high-level human cognitive skills as reasoning mechanisms for robotsOpen questions:Scale up; combining many such skillsWhat are the important skills?Which skills are built upon others?
20Shameless Advertisement ACM/IEEE Second International Conference on Human-Robot InteractionWashington DC, March 9-11, 2007With HRI 2007 Young Researchers Workshop, March 8, 2007Single track, highly multi-disciplinaryRobotics, Cognitive Science, HCI, Human factors, Cognitive Psychology…Submission deadline: August 31, 2006
21A Dynamic Auditory Scene Everyday Auditory Scenes are VERY NoisyFansAlarms/TelephonesTrafficWeatherPeople
22Auditory Perspective Taking Allow a robot to use its knowledge of the environment, both a priori and sensed, to predict what human can hear and effectively understand.Information KioskRobot uses speech to relay information to an interested human listener.Given the auditory scene, can the person understand what the robot is saying?If not, what actions can the robot take to improve intelligibility and knowledge transfer?Stealth BotRobot uses its awareness of the auditory environment to hide from people and or machines.The robot knows its own acoustic signatureNow predict how each action or location will be heard by the listener, and select the best choice.
23An Example of Adaptation: Robot Speech Interface Adjust word usage depending on noise levelsUse smaller words with higher recognition rates.Ask questions to verify understanding; repeat yourself.Change the quality of the speech soundsAdapt voice volume and pitch to overcome local noise levels (Lombard Speech).Emphasize difficult words.Don’t talk during loud noisesReposition OneselfVary the proximity to the listenerFace the listener as much as possibleMove to a different location if all else fails.
24Information Kiosk Overhead Microphone Array Stereo Vision Actions Tracks local sound levelsLocalizes interfering sourcesGuides the vision system to new usersStereo VisionTracks the users position in real-time.ActionsRaise speaking volume relative to users distance and the level of ambient noisePause during loud sounds or speech interruptions.Rotate the robot to face usersReposition the robot if noise levels become too large.
25Acoustic PerspectiveNoise Maps – Combine Knowledge of Sound Sources to Build MapsMeasured Volume/Frequency LevelsSource Locations/DirectionalityWalls and environmental featuresMultiple maps can be built and combined in real-timeModifying action based on noise mapSeeking noisy hiding places so that it can best observe its target without being detected.masking its particular acoustic signature.After exploring the area inside the square, 3 air vents are localized by the robot
264 Sources are combined together as omnidirectional sources, without environmental reflections.