Background Reinforcement Learning (RL) agents learn to do tasks by iteratively performing actions in the world and using resulting experiences to decide.

Slides:



Advertisements
Similar presentations
1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.
Advertisements

Autonomic Scaling of Cloud Computing Resources
Dialogue Policy Optimisation
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Making Simple Decisions
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Dynamic Bayesian Networks (DBNs)
Supervised Learning Recap
Introduction of Probabilistic Reasoning and Bayesian Networks
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
Pattern Recognition and Machine Learning
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
An Introduction to Reinforcement Learning (Part 1) Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham
December Marginal and Joint Beliefs in BN1 A Hybrid Algorithm to Compute Marginal and Joint Beliefs in Bayesian Networks and its complexity Mark.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Leslie Luyt Supervisor: Dr. Karen Bradshaw 2 November 2009.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
1 Factored MDPs Alan Fern * * Based in part on slides by Craig Boutilier.
I Robot.
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.
Presented by Jian-Shiun Tzeng 5/7/2009 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania CIS Technical Report MS-CIS
Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CSC321: Neural Networks Lecture 16: Hidden Markov Models
Chapter 7. Learning through Imitation and Exploration: Towards Humanoid Robots that Learn from Humans in Creating Brain-like Intelligence. Course: Robots.
Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Bandits.
Motivation and Overview
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
Daphne Koller Introduction Motivation and Overview Probabilistic Graphical Models.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Brief Intro to Machine Learning CS539
Nevin L. Zhang Room 3504, phone: ,
Chapter 11: Artificial Intelligence
Learning to Generate Networks
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Class #16 – Tuesday, October 26
Conditional Random Fields
Conditional Random Fields
Graduate School of Information Sciences, Tohoku University
Presentation transcript:

Background Reinforcement Learning (RL) agents learn to do tasks by iteratively performing actions in the world and using resulting experiences to decide future actions. The experiences encapsulate amongst other things a reward or reinforcement on the actions taken The agents learn to perform intelligently by either building a process model of its environment and planning using the model or simply by learning the value of each action in every state and performing the action with the highest value. Model-based reinforcement learners are much more data efficient and relatively superior especially where real world actions are expensive and computation time is relatively cheap. We express recombinant RL as autonomous recombination of processes, especially process models, in the learning and performing of new tasks based on reinforcement. MOTIVATION: We seek to endow agents, particularly robots, with the ability to combine information from process models for separate and possibly diverse tasks in order to help them learn more quickly on some new tasks, which shares some features with each of the models for the previous tasks. Premise Framework Current Issues Environment has structure that can be exploited. Planning and Learning is primarily model-based. Repository of process models exists for related tasks Process is open to learning opportunities (exploration). Illustrating Recombinant RL We are developing a system that will enable intelligent agents learn compact models of new tasks by identifying similarities and differences across a set of related process models. INPUT: A description of the new task in the form of a ‘task graph’ and a ‘pro-forma prior model’ of the task environment. A set of related ‘prospective prior models’ retrieved from a repository of donor process models. KEY OPERATIONS: OUTPUT: Learned compact process model of the new task Recombinant Reinforcement Learning BENEFITS: SCALE UP – exploit modularity in task performance. TRANSFER – share and reuse components of process models OVERCOME PARTIAL OBSERVABILITY SELECTIVE OPTIMISATION & COMPENSATION 1. CHOICE OF REPRESENTATION: How do we represent process models in Recombinant RL? OPTION – Probabilistic graphical Models, Probabilistic graphical models are graphical representations for probabilistic structure, along with functions that can be used to derive joint distributions. Examples include Markov random fields and Bayesian Networks.. They combine representational and algorithmic powers of graph theory with probability theory. Benefits include: Clear semantics – easy to understand and interpret. Provides attractive basis for inference algorithms Facilitates processing of partial observations – hidden variables and missing data. Dependencies can be handled efficiently – partitioned and exploited. Components of the network can be targeted for reuse. We are using two time slice Dynamic Bayesian Networks (DBN) and decision trees / Algebraic Decision Diagrams (ADDs). 2. Transfer of prior information between models. OPTION – Using methods of imaginary Data? 3. Exploration control of learning with the recombined model. OPTION – Approximation techniques - Optimistic model selection? MCMC? 4. Real world applications – vision based robotics …event understanding. School of Computer Science The University of Birmingham Edgbaston, Birmingham B15 2TT UK. Introducing Funlade T. Sunmola (PhD Candidate) Thesis Committee: Dr. Ela Claridge, Prof. Marta Kwiatkowska and Dr. Jeremy Wyatt (supervisor)