Learning to Interpret Natural Language Instructions Monica Babeş-Vroman +, James MacGlashan *, Ruoyuan Gao +, Kevin Winner * Richard Adjogah *, Marie desJardins.

Slides:



Advertisements
Similar presentations
Slide 1 of 18 Uncertainty Representation and Reasoning with MEBN/PR-OWL Kathryn Blackmond Laskey Paulo C. G. da Costa The Volgenau School of Information.
Advertisements

Markov Decision Process
Kien A. Hua Division of Computer Science University of Central Florida.
Fast Algorithms For Hierarchical Range Histogram Constructions
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Subgoal Discovery and Language Learning in Reinforcement Learning Agents Marie desJardins University of Maryland, Baltimore County Université Paris Descartes.
Background Reinforcement Learning (RL) agents learn to do tasks by iteratively performing actions in the world and using resulting experiences to decide.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
POMDPs: Partially Observable Markov Decision Processes Advanced AI
Apprenticeship learning for robotic control Pieter Abbeel Stanford University Joint work with Andrew Y. Ng, Adam Coates, Morgan Quigley.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Machine LearningRL1 Reinforcement Learning in Partially Observable Environments Michael L. Littman.
Apprenticeship Learning by Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Stanford University.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Pieter Abbeel and Andrew Y. Ng Reinforcement Learning and Apprenticeship Learning Pieter Abbeel and Andrew Y. Ng Stanford University.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies * Mary Lou Maher University of Sydney.
Natural Language Understanding
7-Speech Recognition Speech Recognition Concepts
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
Solving Large Markov Decision Processes Yilan Gu Dept. of Computer Science University of Toronto April 12, 2004.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hierarchical Reinforcement Learning Using Graphical Models Victoria Manfredi and.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Solving POMDPs through Macro Decomposition
Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.
Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
Supertagging CMSC Natural Language Processing January 31, 2006.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Brief Intro to Machine Learning CS539
M. Lopes (ISR) Francisco Melo (INESC-ID) L. Montesano (ISR)
Generative Adversarial Imitation Learning
István Szita & András Lőrincz
Jie Bao, Doina Caragea and Vasant G Honavar
Natural Language Processing (NLP)
Reinforcement learning (Chapter 21)
WELCOME TO COSRI IBADAN
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Inconsistent Constraints
Integrating Learning of Dialog Strategies and Semantic Parsing
An Adaptive Middleware for Supporting Time-Critical Event Response
Apprenticeship Learning via Inverse Reinforcement Learning
CS4705 Natural Language Processing
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Natural Language Processing (NLP)
Reinforcement Learning Dealing with Partial Observability
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Natural Language Processing (NLP)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Learning to Interpret Natural Language Instructions Monica Babeş-Vroman +, James MacGlashan *, Ruoyuan Gao +, Kevin Winner * Richard Adjogah *, Marie desJardins *, Michael Littman + and Smaranda Muresan ++ *Department of Computer Science and Electrical Engineering University of Maryland, Baltimore County +Computer Science Department, Rutgers University ++School of Communication and Information, Rutgers University NAACL-2012 WORKSHOP From Words to Actions: Semantic Interpretation in an Actionable Context

Outline Motivation and Problem Our approach Pilot study Conclusions and future work

Motivation Bring me the red mug that I left in the conference room

Motivation

Train an artificial agent to learn to carry out complex multistep tasks specified in natural language. Problem

Another Example of A Task of pushing an object to a room. Ex : square and red room Abstract task: move obj to color room move square to red roommove star to green roomgo to green room

Outline What is the problem? Our approach Pilot study Conclusions and future work

Training Data: S2S2 F1F2F3F4F5F6 S3S3 S4S4 S1S1 Object-oriented Markov Decision Process (OO-MDP) [Diuk et al., 2008]

“Push the star into the teal room”

“Push the star into the teal room” Task Learning from Demonstrations Semantic Parsing Task Abstraction F2 F6 Our Approach push(star1, room1) P1(room1, teal) F1F2F3F4F5F6

“Push the star into the teal room” Task Learning from Demonstrations Semantic Parsing Task Abstraction F2 F6 Our Approach “teal” push(star1, room1) P1(room1, teal) P1 color “push“ objToRoom

“Push the star into the teal room” Task Learning from Demonstrations Semantic Parsing Task Abstraction Our Approach

π*π* S1S1 S2S2 S3S3 S4S4 30% 70% 0% 100% 0% RL Task Learning from Demonstration = Inverse Reinforcement Learning (IRL) But first let’s see Reinforcement Learning Problem S1S1 S2S2 S3S3 S4S4 MDP (S, A, T, γ ) R

πEπE S1S1 S2S2 S3S3 S4S4 30% 70% 0% 100% 0% Task Learning From Demonstrations Inverse Reinforcement Learning

πEπE S1S1 S2S2 S3S3 S4S4 30% 70% 0% 100% 0% Task Learning From Demonstrations Inverse Reinforcement Learning

πEπE S1S1 S2S2 S3S3 S4S4 30% 70% 0% 100% 0% R Task Learning From Demonstrations Inverse Reinforcement Learning S1S1 S2S2 S3S3 S4S4 MDP (S, A, T, γ ) IRL

Maximum Likelihood IRL [M. Babeş, V. Marivate, K. Subramanian and M. Littman 2011] θ Pr( ) R π Boltzmann Distribution update in the direction of ∇ Pr( ) φ MLIRL Assumption: Reward function is a linear combination of a known set of features. (e.g., ) Goal: Find θ to maximize the probability of the observed trajectories. (s,a): (state, action) pair R: reward function φ: feature vector π: policy Pr( ): probability of demonstrations

“Push the star into the teal room” Task Learning from Demonstrations Semantic Parsing Task Abstraction Our Approach

Semantic model (ontology) NP (w, ) → Adj (w 1, ), NP (w 2, ): Φ i NP (w, ) → Det (w 1, ), NP (w 2, ): Φ i … Syntax + Semantics + Ontological Constraints Grammar PARSINGPARSING Semantic representation text Semantic Parsing Lexicalized Well-Founded Grammars (LWFG) [Muresan, 2006; 2008; 2012]

Empty Semantic model Grammar PARSINGPARSING X1.is=teal, X.P1=X1, X.isa=room teal room Semantic Parsing Lexicalized Well-Founded Grammars (LWFG) [Muresan, 2006; 2008; 2012] room1 teal P1 uninstantiated variable ≅ NP (w, ) → Adj (w 1, ), NP (w 2, ): Φ i NP (w, ) → Det (w 1, ), NP (w 2, ): Φ i …

Syntax/Semantics/Ontological Constraints Grammar PARSINGPARSING X1.is=green, X.color=X1, X.isa=room teal room Semantic Parsing Lexicalized Well-Founded Grammars (LWFG) [Muresan, 2006; 2008; 2012] room1 color ≅ Semantic model (ontology) green NP (w, ) → Adj (w 1, ), NP (w 2, ): Φ i NP (w, ) → Det (w 1, ), NP (w 2, ): Φ i …

Learning LWFG [Muresan and Rambow 2007; Muresan, 2010; 2011] Currently learning is done offline No assumption of existing semantic model (loud noise, ) (the noise, ) … Small Training Data RELATIONAL LEARNING MODEL LWFG NP PP Representative Examples (loud noise) (the noise) (loud clear noise) (the loud noise) … Generalization Set NP NP (w, ) → Adj (w 1, ), NP (w 2, ):Φ i NP (w, ) → Det (w 1, ), NP (w 2, ):, Φ i …

Using LWFG for Semantic Parsing Obtain representation with same underlying meaning despite different syntactic forms “go to the green room” vs “go to the room that is green” – Dynamic Semantic Model Can start with an empty semantic model Augmented based on feedback from the TA component push  blockToRoom, teal  green

“Push the star into the teal room” Task Learning from Demonstrations Semantic Parsing Task Abstraction Our Approach

Task Abstraction Creates an abstract task to represent a set of similar grounded tasks – Specifies goal conditions and reward function in terms of OO-MDP propositional functions Combines information from SP and IRL and provides feedback to both – Provides SP domain specific semantic information – Provides IRL relevant features

Similar Grounded Tasks Goal Condition objectIn(o2, l1) Other Terminal Facts isSquare(o2) isYellow(o2) isRed(l1) isGreen(l2) agentIn(o46, l2) … Goal Condition objectIn(o15, l2) Other Terminal Facts isStar(o15) isYellow(o15) isGreen(l2) isRed(l3) objectIn(o31, l3) … Some grounded tasks are similar in logical goal conditions, but different in object references and facts about those referenced objects

Abstract Task Definition Define task conditions as a partial first order logic expression The expression takes propositional function parameters to complete it

System Structure Semantic Parsing Inverse Reinforcement Learning (IRL) Task Abstraction

Pilot Study Unigram Language Model Data: (S i, T i ). Hidden Data: z ji : Pr(R j | (S i, T i ) ). Parameters: x kj : Pr(w k | R j ).

Pilot Study Data: (S i, T i ). Hidden Data: z ji : Pr(R j | (S i, T i ) ). Parameters: x kj : Pr(w k | R j ). Unigram Language Model Pr(“push”|R1)Pr(“room”|R1)... Pr(“push”|R 2 )Pr(“room”|R 2 )...

Pilot Study New Instruction, S N : “Go to the green room.” Pr(“push”|R1)Pr(“room”|R1)... Pr(“push”|R 2 )Pr(“room”|R 2 )...

Pilot Study New Instruction, S’ N : “Go with the star to green.” Pr(“push”|R1)Pr(“room”|R1)... Pr(“push”|R 2 )Pr(“room”|R 2 )...

Outline What is the problem? Our approach Pilot study Conclusions and future work

Conclusions Proposed new approach for training an artificial agent to learn to carry out complex multistep tasks specified in natural language, from pairs of instructions and demonstrations Showed pilot study with a simplified model

Current/Future Work SP, TA and IRL integration High-level multistep tasks Probabilistic LWFGs – Probabilistic domain (semantic) model – Learning/Parsing algorithms with both hard and soft constraints TA – add support for hierarchical task definitions that can handle temporal conditions IRL – find a partitioning of the execution trace where each partition has it own reward function

Thank you!