Christian Siagian Laurent Itti Univ. Southern California, CA, USA Gist: A Mobile Robotics Application of Context-Based Vision in Outdoor Environment Christian Siagian Laurent Itti Univ. Southern California, CA, USA
Outline Mobile robot localization Biological approach to vision Gist model Testing and results Discussion and conclusion
Mobile Robot Localization Where are we? Localization = identifying landmarks
Mobile Robot Localization Indoors: strong assumptions of flat walls, narrow hallways, and solid angles Ranging sensors (laser and sonar) for mapping Outdoors: less conforming set of surfaces Ranging sensors are less effective, vision is better
Robot Vision Localization Object-based Vision Localization Objects as landmarks Accuracy: Based on object observation model Selection of reliable objects Can accommodate metric & topological mapping Efficiency: Trade-off between efficiency and robustness within the localization framework Scalability: Generally, the size of environments scale with the number of objects in database The task of object selection becomes harder
Robot Vision Localization Region-based Vision Localization regions as landmarks Accuracy: Needs configuration of regions Prone to over/under segmentation Observation model is less sophisticated Efficiency: Can use lower resolutions although flexible matching is necessary Scalability: Need more expressive region signature and geometry More complex may mean less stable, however
Robot Vision Localization Scene-based Vision Localization Scenes as a whole as Landmarks Color histograms [Ulrich and Nourbakhsh 2000] Fourier Transform [Oliva & Torralba 2001] Wavelet pyramids [Torralba 2003] Histogram of Dominant features [Renniger & Malik 2004] Accuracy: Lends itself more to topological mapping Resolution: localization within place is needed Naturally view invariance Efficiency: Can be done in lower resolution Scalability: stability and uniqueness Learn a smaller set of scene features Addition of new environments present uniqueness problem Places can look more and more the same
Gist Definition and background Nature of tasks done with gist Essence, holistic characteristics of an image Context information obtained within a eye saccade (app. 150 ms.) Evidence of place recognizing cells at Parahippocampal Place Area (PPA) Biologically plausible models of Gist are yet to be proposed Nature of tasks done with gist Scene categorization/context recognition Region priming/layout recognition Resolution/scale selection
Human Vision Architecture Visual Cortex: Low level filters, center-surround, and normalization Saliency Model: Attend to pertinent regions Gist Model: Compute image general characteristics High Level Vision: Object recognition Layout recognition Scene understanding
Gist Model Utilize the same Visual Cortex raw features in the saliency model [Itti 2001] Gist is theoretically non-redundant with Saliency Gist vs. Saliency Instead of looking at most conspicuous locations in image, looks at scene as a whole Detection of regularities, not irregularities Cooperation (Accumulation) vs. competition (WTA) among locations More spatial emphasis in saliency Local vs. global/regional interaction
Gist Model Implementation V1 Raw image feature-Maps Orientation Channel Gabor filters at 4 angles (0,45,90,135) on 4 scales = 16 sub-channels Color: red-green and blue-yellow center surround each with 6 scale combinations = 12 sub-channels Intensity dark-bright center-surround with 6 scale combinations = 6 sub-channels = Total of 34 sub-channels
Gist Model Implementation Gist Feature Extraction Average values of predetermined grid
Gist Model Implementation Dimension Reduction Original: 34 sub-channels x 16 features = 544 features PCA/ICA reduction: 80 features Kept >95% of variance PCA/ICA reduction Too much redundancy Reduction matrix is too random to decipher
Gist Model Implementation Dimension Reduction Original: 34 sub-channels x 16 features = 544 features PCA/ICA reduction: 80 features Kept >95% of variance Place Classification Three-layer neural networks PCA/ICA reduction Too much redundancy Reduction matrix is too random to decipher
System Example Run
Testing & Results Site selection: Various lighting conditions Different challenges appearance-wise Variability in area covered/ path lengths Various lighting conditions Single-view filming Clean break between segments Scalability: combine all sites
Map of Experiment Sites
Site 1: Building Complex
Site 1 Experiment Input Image Gist Feature-vectors System Output PCA/ICA reduced features
Site 1 Results Output Label Assigned Label
Site 2:Vegetation-filled Park
Site 2 Result Output Label Assigned Label
Site 2 Experiment Input Image Gist Feature-vectors System Output PCA/ICA reduced features
Site 3: Open Field Park
Site 3 Experiment Input Image Gist Feature-vectors System Output PCA/ICA reduced features
Site 3 Result Output Label Assigned Label
Combined Sites Result
Discussion & Conclusion Result of current model: Success rate between 82.48% and 87.93% Combined rate of 85.96% 4.73% error in inter-site classification Integrating saliency for robot navigation Localization within segment Identifying discriminating cues in the environment Issues in object-based systems still applies Bad view detection Foreground objects sometimes occlude whole view Obstacle avoidance, exploration, etc.
Discussion Integration of gist and saliency in general Single representation of both models Influence of saliency to gist and vice versa Involvement of saliency in improving gist estimation Gist helpful in identifying/filtering salient location Testing the limits of Gist: psychophysics experiments Change blindness test for large scale layout changes Varying exposure time Isolation of bottom up - top down influences