Computer Science Readings: Reinforcement Learning Presentation by: Arif OZGELEN.

Slides:



Advertisements
Similar presentations
Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis.
Advertisements

The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.
Visual Attention Attention is the ability to select objects of interest from the surrounding environment A reliable measure of attention is eye movement.
Human (ERP and imaging) and monkey (cell recording) data together 1. Modality specific extrastriate cortex is modulated by attention (V4, IT, MT). 2. V1.
University of Amsterdam Search, Navigate, and Actuate - Quantitative Navigation Arnoud Visser 1 Search, Navigate, and Actuate Quantative Navigation.
1 Computational Vision CSCI 363, Fall 2012 Lecture 35 Perceptual Organization II.
Introduction To Tracking
電腦視覺 Computer and Robot Vision I Chapter2: Binary Machine Vision: Thresholding and Segmentation Instructor: Shih-Shinh Huang 1.
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Segmentation (2): edge detection
CSC321: Neural Networks Lecture 3: Perceptrons
Quadtrees, Octrees and their Applications in Digital Image Processing
Control of Attention and Gaze in the Natural World.
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
Learning to Detect A Salient Object Reporter: 鄭綱 (3/2)
CS 561, Sessions 27 1 Towards intelligent machines Thanks to CSCI561, we now know how to… - Search (and play games) - Build a knowledge base using FOL.
Visual Attention More information in visual field than we can process at a given moment Solutions Shifts of Visual Attention related to eye movements Some.
Quadtrees, Octrees and their Applications in Digital Image Processing
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
Visual Attention Jeremy Wyatt. Where to look? Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to.
December 2, 2014Computer Vision Lecture 21: Image Understanding 1 Today’s topic is.. Image Understanding.
Computer Vision Lecture 3: Digital Images
1B50 – Percepts and Concepts Daniel J Hulme. Outline Cognitive Vision –Why do we want computers to see? –Why can’t computers see? –Introducing percepts.
Computer Vision Systems for the Blind and Visually Disabled. STATS 19 SEM Talk 3. Alan Yuille. UCLA. Dept. Statistics and Psychology.
Autonomous Learning of Object Models on Mobile Robots Xiang Li Ph.D. student supervised by Dr. Mohan Sridharan Stochastic Estimation and Autonomous Robotics.
Studying Visual Attention with the Visual Search Paradigm Marc Pomplun Department of Computer Science University of Massachusetts at Boston
Manipulating Attention in Computer Games Matthias Bernhard, Le Zhang, Michael Wimmer Institute of Computer Graphics and Algorithms Vienna University of.
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.
Active Vision Key points: Acting to obtain information Eye movements Depth from motion parallax Extracting motion information from a spatio-temporal pattern.
Reinforcement Learning
Presented by Tienwei Tsai July, 2005
Text Lecture 2 Schwartz.  The problem for the designer is to ensure all visual queries can be effectively and rapidly served.  Semantically meaningful.
REU Presentation Week 3 Nicholas Baker.  What features “pop out” in a scene?  No prior information/goal  Identify areas of large feature contrasts.
Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.
December 4, 2014Computer Vision Lecture 22: Depth 1 Stereo Vision Comparing the similar triangles PMC l and p l LC l, we get: Similarly, for PNC r and.
1 Computational Vision CSCI 363, Fall 2012 Lecture 20 Stereo, Motion.
September 23, 2014Computer Vision Lecture 5: Binary Image Processing 1 Binary Images Binary images are grayscale images with only two possible levels of.
黃文中 Introduction The Model Results Conclusion 2.
Visual Inspection Product reliability is of maximum importance in most mass-production facilities.  100% inspection of all parts, subassemblies, and.
September 5, 2013Computer Vision Lecture 2: Digital Images 1 Computer Vision A simple two-stage model of computer vision: Image processing Scene analysis.
Quadtrees, Octrees and their Applications in Digital Image Processing.
Visually guided attention during flying OR Pilots “do not like” fovea because they cannot pay attention to more than 1% of space at any one time.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Region-Based Saliency Detection and Its Application in Object Recognition IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, VOL. 24 NO. 5,
The geometry of the system consisting of the hyperbolic mirror and the CCD camera is shown to the right. The points on the mirror surface can be expressed.
Animate Image-Matching for Retrieval in Digital Libraries G. Boccignone, V. Caggiano, A. Chianese, V. Moscato and A. Picariello.
Autonomous Robots Vision © Manfred Huber 2014.
MIT AI Lab / LIDS Laboatory for Information and Decision Systems & Artificial Intelligence Laboratory Massachusetts Institute of Technology A Unified Multiresolution.
CSC321: Neural Networks Lecture 18: Distributed Representations
October 1, 2013Computer Vision Lecture 9: From Edges to Contours 1 Canny Edge Detector However, usually there will still be noise in the array E[i, j],
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
Wonjun Kim and Changick Kim, Member, IEEE
Optimal Eye Movement Strategies In Visual Search.
1 Computational Vision CSCI 363, Fall 2012 Lecture 16 Stereopsis.
Path Planning Based on Ant Colony Algorithm and Distributed Local Navigation for Multi-Robot Systems International Conference on Mechatronics and Automation.
Computational Vision CSCI 363, Fall 2012 Lecture 17 Stereopsis II
1 Computational Vision CSCI 363, Fall 2012 Lecture 32 Biological Heading, Color.
ICCV 2009 Tilke Judd, Krista Ehinger, Fr´edo Durand, Antonio Torralba.
Functionality of objects through observation and Interaction Ruzena Bajcsy based on Luca Bogoni’s Ph.D thesis April 2016.
Over the recent years, computer vision has started to play a significant role in the Human Computer Interaction (HCI). With efficient object tracking.
National Taiwan Normal A System to Detect Complex Motion of Nearby Vehicles on Freeways C. Y. Fang Department of Information.
Another Example: Circle Detection
CSSE463: Image Recognition Day 21
Common Classification Tasks
Computer Vision Lecture 5: Binary Image Processing
Computer Vision Lecture 4: Color
Learning Sensorimotor Contingencies
Presentation transcript:

Computer Science Readings: Reinforcement Learning Presentation by: Arif OZGELEN

How do we perform visual search? Look at usual places the item is likely to be. If item is small we tend to get closer to the area that we are searching in order to heighten our ability to detect. We look for certain properties of the target object which makes it distinguishable from the search space. e.g. color, shape, size, etc…

A Reinforcement Learning Model of Selective Visual Attention ACM 2001  Silviu Minut, Autonomous Agents Lab, Department of Computer Science, Michigan State University.  Sridhar Mahadevan, Autonomous Agents Lab, Department of Computer Science, Michigan State University.

The Problem of Visual Search Goal:To find small objects in a large usually cluttered environment. e.g. a pen on a desk. Preferrable to use wide-field of view images. Identifying small objects require high resolution images Results in very high dimensional input array.

Nature’s Method: Foveated Vision - I Fovea: Anatomically defined as the central region of the retina with high density of receptive cells. Density of receptive cells decreases exponentially from the fovea towards periphery.

Nature’s Method: Foveated Vision - II Saccades: To make up for the loss of information incurred by the decrease in resolution in the periphery, eyes are re- oriented by rapid ballistic motions (up to 900°/s) called saccades. Fixations: Periods between saccades during which the eyes remain relatively fixed, to process visual information and to select the next fixation point.

Foveated Vision: Eye Scan Patterns

Using Foveated Vision Using foveal image processing reduces the dimension of the input data but in turn generates a sequential decision problem:  Choosing the next fixation point requires an efficient gaze control mechanism in order to direct the gaze to the most salient object.

Gaze Control- Salient Features In order to solve the problem of gaze control, next fixation point must be decided based on low resolution images which don’t appear in fovea. Saliency Map Theory (Koch and Ulmann) Task independent bottom up model for visual attention. Itti and Koch- Based on Saliency Map Theory 3 types of feature maps (color map, edge map, intensity map) are fused together to form saliency map. Low resolution images alone are usually not sufficient for this decision problem.

Gaze Control- Control Mechanism Implementation Implementation of a high level mechanism is required to control low level reactive attention. Tsotsos model – proposes selective tuning of visual processing via a hierarchical winner takes all process. Information should be integrated from one fixation to the next for a global understanding of the scene. Model: top-down gaze control with bottom-up reactive saliency map processing based on RL.

Problem Definition and General Approach - I Given an object and an environment: How to build a vision agent that learns where the object is likely to be found. How to direct its gaze to the object. Set of Landmarks {L 0,L 1,..,L n } representing regions in the environment. A policy on this set directs the camera to the most probable region containing the target object.

Problem Definition and General Approach – II The approach does not require high level feature detectors. Policy learned through RL is based on actual images seen by the camera. Once the direction has been selected the precise location of the next fixation point is determined by means of visual saliency. Camera takes low resolution/wide-field of view images at discrete time intervals. Using these low resolution images the system tries to recognize the target object using a low resolution template.

Problem Definition and General Approach – III Since reasonable detection of a small sized object is difficult at low resolution, system tries to get candidate locations for the target object. The foveated vision is simuated by zooming in and grabbing high resolution/ narrow field-of- view images centered at the candidate locations which are compared with a high resolution template of the target image.

Target Object and the Environment Color template of the target object (left). Environment (bottom).

Reinforcement Learning The agent may or may not know the priori the transition probabilities and the reward. In this case dynamic programming techniques could be used to compute an optimal policy.

Q-Learning In the visual search problem, the transition probabilities and the reward are not known to the agent. A model free Q-learning algorithm used to find the optimal policies.

States – Objects in the Environment Recorded scan patterns show that people fixate from object to object therefore it is natural to define the states as the objects in the environment. Paradox: Objects must be recognized as worth attending to, before they are fixated on. However, an object cannot be recognized prior to the fixation, since it is perceived at low resolution.

States – Clusters of Images States are defined as clusters of images representing the same region. Each image is represented with color histograms on a reduced number of bins (48 colors for the lab environment). Using histogram introduces perceptual aliasing as two different images have identical histograms. To reduce aliasing, histograms are computed distributedly across quadrants. Expected to reduce aliasing since natural environments are sufficiently rich.

Kullback Distance - I

Kullback Distance - II

Actions Actions are defined as the saccades to the most salient point. {A 1,..,A 8 } to represent 8 directions. In addition A 0 represents the most salient point in the whole image.

Reward Agent receives positive reward for a saccade bringing the object in to the field of view. Agent receives negative reward if the object is not in the field of view after a saccade.

Within Fixation Processing It is the stage when the eyes fixate on a point and the agent processes visual information and decides where the fixate next. Comprises computation of two components: A set of two feature maps implementing low level visual attention, used to select the next fixation point. A recognizer, used at low resolution for detection of candidate target objects and at high resolution for recognition of target.

Histogram Intersection It is a method used to match two images, I (search image) and M (model). It is difficult to find a threshold between similar and dissimilar images in this method unless the model is pre- specified.

Histogram Back-projection Given two images I and M, histogram back projection locates M in I. Color histograms h I and h M are computed on the same number of color bins. Operation requires one pass through I. For every pixel (x,y), B(x,y) = R(j) iff I(x,y) falls in bin j. Always finds candidates.

Histogram Back-Projection Example

Symmetry Operator In order to fixate on objects a symmetry operator is used since most man-made objects have vertical, horizontal or radial symmetry. It computes an edge map first and then has each pair p i, p j of an edge pixels vote for its midpoint by (9).

Symmetry Map

Model Description - I Each low resolution image is processed by two main modules Top module (RL) learns a set of clusters consisting of images with similar color histograms. Clusters represents physical regions and are used as states in the Q- learning method. Second module consists of low-level visual routines. Its purpose is to compute color and symmetry maps for saliency and to recognize the target object at both low and high resolution.

Model Description - II Each low resolution image is processed by two main modules Top module (RL) learns a set of clusters

Visual Search Agent Model

Algorithm - Initialization

Algorithm – If object found

Algorithm – If object not found

Results The agent is trained to learn in which direction to direct its gaze in order to reach the region where the target object is most likely to be found, 400 epochs each. Epoch: a sequence of at most 100 fixations. Every 5 th epoch was used for testing where agent simply executed the learned policy. Performance metric was number of fixations. Within a single trial, starting point was the same in all test epochs.

Experimental Results - I

Experimental Results - II

Experimental Results - III

Sequence of Fixations

Experimental Results - IV

Experimental Results - V

Experimental Results - VI

Conclusion Developed a model of selective attention for a visual search task which, is a combination of visual processing and control for attention. Control is achieved by means of RL over a low level, visual mechanism of selecting the next fixation. Color and symmetry are used for selection of next fixation and it is not necessary to combine them in a unique saliency map. The information is integrated from saccade to saccade

Future Work Goal is to extend this approach to a mobile robot. Problem becomes more challenging as the position consequently the appearance of the object changes according to the robots position. Single template is not sufficient. In this paper it is assumed that the environment is rich in color so that perceptual aliasing would not be an issue. Extension to a mobile robot, will inevitably lead to learning in inherently perceptually aliased environments.