Ratbert: Nearest Sequence Memory Based Prediction Model Applied to Robot Navigation by Sergey Alexandrov iCML 2003.

Slides:



Advertisements
Similar presentations
Viktor Zhumatiya, Faustino Gomeza,
Advertisements

Reinforcement Learning
A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan
Sensor Based Planners Bug algorithms.
Sonar and Localization LMICSE Workshop June , 2005 Alma College.
Higher Coordination with Less Control – A Result of Information Maximization in the Sensorimotor Loop Keyan Zahedi, Nihat Ay, Ralf Der (Published on: May.
CS 795 – Spring  “Software Systems are increasingly Situated in dynamic, mission critical settings ◦ Operational profile is dynamic, and depends.
Lab 2 Lab 3 Homework Labs 4-6 Final Project Late No Videos Write up
Do Dogs Know Calculus ? This project will explore the innate ability of a dog to find the quickest path to retrieve a ball thrown into the water. We calculate.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Machine LearningRL1 Reinforcement Learning in Partially Observable Environments Michael L. Littman.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Comparison of Instance-Based Techniques for Learning to Predict Changes in Stock Prices iCML Conference December 10, 2003 Presented by: David LeRoux.
Markov Decision Processes
Probabilistic Model of Sequences Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2003 material from Jean-Claude Latombe, and Daphne Koller.
Mobile Robot ApplicationsMobile Robot Applications Textbook: –T. Bräunl Embedded Robotics, Springer 2003 Recommended Reading: 1. J. Jones, A. Flynn: Mobile.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT.
Radial Basis Function Networks
Minimalistic Robot for Mapping and Coverage Supervisors: Dr. Amir Degni Mr. Koby Kohai Students’ names: David Shallom Guy Greenhouse Date: 10/25/2012 Control.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Reinforcement Learning
K Nearest Neighborhood (KNNs)
Lynbrook HS Robotics San Jose, CA aMazing Robot iARoC Presentation.
Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.
Swarm Intelligence 虞台文.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.
Discrete Distributions The values generated for a random variable must be from a finite distinct set of individual values. For example, based on past observations,
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.
Wandering Standpoint Algorithm. Wandering Standpoint Algorithm for local path planning Description: –Local path planning algorithm. Required: –Local distance.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama1)2) Hirotaka Hachiya1)2) Christopher Towell2) Sethu.
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
Robotics Club: 5:30 this evening
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
Reinforcement Learning
Using IR For Maze Navigation Kyle W. Lawton and Liz Shrecengost.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
1 Chapter 11 Understanding Randomness. 2 Why Random? What is it about chance outcomes being random that makes random selection seem fair? Two things:
Abstract LSPI (Least-Squares Policy Iteration) works well in value function approximation Gaussian kernel is a popular choice as a basis function but can.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Data Mining – Algorithms: Instance-Based Learning
CS b659: Intelligent Robotics
Reinforcement Learning in POMDPs Without Resets
Schedule for next 2 weeks
A Simple Artificial Neuron
Navigation In Dynamic Environment
Revision (Part II) Ke Chen
Revision (Part II) Ke Chen
Larry Braile, Purdue University
traveling salesman problem
CS 416 Artificial Intelligence
Topological Signatures For Fast Mobility Analysis
Clustering The process of grouping samples so that the samples are similar within each group.
Accurate Video Localization
Presentation transcript:

Ratbert: Nearest Sequence Memory Based Prediction Model Applied to Robot Navigation by Sergey Alexandrov iCML 2003

Defining the Problem ► Choosing a navigational action (simplified world: left, right, forward, back) ► Consequence of action unknown given the immediate state (expected observation) ► How to learn an unknown environment enough to accurately predict such consequences? ► Learning the entire model (POMDP) – for example, Baum-Welch (problem: slow) ► Goal-finding tasks – learning a path to a specific state (reinforcement problem) – for example, NSM (Nearest Sequence Memory) ► Generalized observation prediction – NSMP (Nearest Sequence Memory Predictor) Approaches

► Experience Seq n = {(o 1,a 1 )…(o n,a n )} ► NSMP(Seq n ) = observation predicted by executing a n ► Derived by examining k nearest matches (NNS) NSMP in Short oioi ? aiai o i+1 o2o2 o3o3 o2o2 o1o1 Example (k=4):

NSMP in Short (Cont.) ► Based on kNN applied to sequences of previous experience (NSM) ► Find k nearest (here: longest) sequence matches to immediately prior experience ► Calculate weights for each observation reached by the k sequence sections (tradeoff between long matches, and high frequency of matches) ► Probability of each observation = normalized weight ► Predicted observation is the observation with the highest probability

Testing ► Ratbert: Lego-based robot capable of simple navigation inside a small maze. Senses walls in front, left, right, and noisy distance. ► Software simulation based on Ratbert’s sensor inputs (larger environment, greater # of runs, longer sequences) ► Actions: {left, right, forward, back} Observations: {left, right, front, distance} ► For both trials, a training sequence was collected via random exploration, then a testing sequence was executed, comparing the predicted observation with the actual observation. For both, k was set to 4. ► Results compared to bigrams.

Results ► Plot: prediction rate vs. training sequence length. ► First graph is for Ratbert, second graph is for the software simulation. ► NSMP consistently produced a better, although not optimal, prediction rate.

Further Work ► Comparison to other probabilistic predictive models ► Determine optimal exploration method ► Examine situations that trip up the algorithm ► Go beyond “gridworld” concepts of left/right/forward/back to more realistic navigation ► Work on mapping real sensor data to discrete classes required by instance-based algorithms such as NSM/NSMP (for example, using single linkage hierarchical clustering until cluster distance <= sensor error)

Thank You