Presentation is loading. Please wait.

Presentation is loading. Please wait.

Michigan State University1 Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways Zhengping Ji Embodied Intelligence.

Similar presentations


Presentation on theme: "Michigan State University1 Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways Zhengping Ji Embodied Intelligence."— Presentation transcript:

1 Michigan State University1 Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways Zhengping Ji Embodied Intelligence Laboratory Computer Science and Engineering Michigan State University, Lansing, USA

2 Michigan State University2 Outline l Attention and recognition: Chicken-egg problem l Motivation: brain inspired, neuromorphic, brain’s visual pathway l Saliency-based attention l Where-what Network (WWN): l How to integrate the saliency-based attention & top- down attention control l How attention and recognition helps each other l Conclusions and future work

3 Michigan State University3 What is attention?

4 Michigan State University4 Bottom-up Attention (Saliency)

5 Michigan State University5 Bottom-up Attention (Saliency)

6 Michigan State University6 Attention Shifting

7 Michigan State University7 Attention Shifting

8 Michigan State University8 Attention Shifting

9 Michigan State University9 Attention Shifting

10 Michigan State University10 Spatial Top-down Attention Control

11 Michigan State University11 Spatial Top-down Attention Control e.g. pay attention to the center

12 Michigan State University12 Object-based Top-down Attention Control

13 Michigan State University13 Object-based Top-down Attention Control e.g. pay attention to the square

14 Michigan State University14 Chicken-egg Problem l Without attention, recognition cannot do well: l recognition requires attended areas for the further processing. l Without recognition, attention is limited: l not only bottom-up saliency-based cues, but also top-down object-dependant signals and top-down spatial controls.

15 Michigan State University15 Problem

16 Michigan State University16 Challenge l High-dimensional space l Background noise l Large variance l Scale l Shape l Illumination l View point l …..

17 Michigan State University17 Saliency-based Attention (I) IHDR Tree Heading Direction Boundary Detection Part The mapping from two visual images to correct road boundary type for each sub-window (Reinforcement Learning) Action Generation Part The mapping from road boundary type to correct heading direction (Supervised Learning) e1e1 Desired Path Win1 Win2 Win3Win4 Win5 Win6 e2e2 e3e3 e4e4 e5e5 e6e6 Naïve way: attention window by guessing

18 Michigan State University18 Saliency-based Attention (II) Low-level image processing Itti & Koch et al. 1998

19 Michigan State University19 Review l Attention and recognition: Chicken-egg problem l Motivation: brain inspired, neuromorphic, brain’s visual pathway l Saliency-based attention l Where-what Network (WWN): l How to integrate the saliency-based attention & top- down attention control l How attention and recognition helps each other l Conclusions and future work

20 Michigan State University20 Biological Motivations

21 Michigan State University21 Challenge: Foreground Teaching l How does a neuron separate a foreground from a complex background? l No need for a teacher to hand-segment the foreground l Fixed foreground, changing background l E.g., during baby object tracking l The background weights are averaged out (no effect during neuronal competition)

22 Michigan State University22 Novelty l Bottom-up attention: l Koch & Ullman in 1985, Itti & Koch et al. 1998, Baker et al. 2001, etc. l Position based top-down control: l Olshausen et al. 1993, Tsotsos et al. 1995, Mozer et al. 1996, Schill et al. 2001, Rao et al. 2004, etc. l Object based top-down control: l Deco & Rolls 2004 (no performance evaluation), etc. l Our work: l Saliency is developed features l Both bottom-up and top-down based control l Top-down: either object, position or none l Attention and recognition is a single process

23 Michigan State University23 ICDL Architecture Image V1 V2 “what”-motor 40*40 11*11 21*21 “where”-motor (r, c) 40*40 pixel- based Size fixed: 20*20 global

24 Michigan State University24 Multi-level Receptive Fields

25 Michigan State University25 Layer Computation l Compute pre-response of cell (i, j) at time t l Sort: z 1 ≥ z 2 ≥ … z k … ≥ z m ; l Only top-k neurons respond to keep selectiveness and long- term memory l Response range is normalized l Update the local winners

26 Michigan State University26 In-place Learning Rule l Do not use back-prop l Not biologically plausible l Does not give long-term memory l Do not use any distribution model (e.g., Gaussian mixture) l Avoid high complexity of covariance matrix l New Hebbian like rule: l With automatic plasticity scheduling: only winners update l Minimum error toward target in every incremental estimation stage (local first principal component)

27 Michigan State University27 Top-down Attention Recruit & identify class invariant features Recruit & identify position invariant features

28 Michigan State University28 Experiment Foreground objects defined by “what” motor (20*20) Attended areas defined by “where” motor Randomly Selected background patches (40*40)

29 Michigan State University29 Developed Layer 1 Bottom-up synaptic weights of neurons in Layer 1, developed through randomly selected patches from natural images.

30 Michigan State University30 Developed Layer 2 Bottom-up synaptic weights of neurons in Layer 2. Not Intuitive for understanding!!

31 Michigan State University31 Response Weighted Stimuli for Layer 2

32 Michigan State University32 Experimental Result I Recognition rate with incremental learning

33 Michigan State University33 Experimental Result II (a) Examples of input images; (b) Responses of attention (“where”) motors when supervised by “what” motors. (c) Responses of attention (“where”) motor when “what” supervision is not available.

34 Michigan State University34 Summary l “What” motor helps to direct attention of network to features of particular object; l “Where” motor helps to direct attention to positional information (from 45% to 100% accurate when “where” information is present); l Saliency-based bottom-up attention, location-based top-down attention, and object-based top-down attention are integrated in the top-k spatial competition rule;

35 Michigan State University35 Problems l The accuracy for the “where” motors is not good: 45.53% l Layer 1 was developed offline; l More layers are needed to handle more positions l Where motor should be given externally, instead of retina-based representation l No internal iterations especially when the number of hidden layers is larger than one l No cross-level projections

36 Michigan State University36 Fully Implemented WWN (Original Design) “where”-motor Image (40*40) V1 (40*40) V2 (40*40) V4 (40*40) “what”-motor: 4 objects 11*11 21*21 V3 LIP 31*31 IT (40*40) MT PP (r, c) 25 center Fixed size motor global

37 Michigan State University37 Problems l The accuracy for “where” and “what” motors are not good: 25.53% for “what” motor and 4.15% for “where” motor l Too many parameters to be tuned l Training is extremely slow l How to do the internal iterations l “Sweeping” way: always use recently updated weights and responses. l Always use p-1 weights and responses, where p records the current number of iterations. l The response should not be normalized in each lateral inhibition neighborhood.

38 Michigan State University38 Modified Simple Architecture Image V1 V2 “what”-motor : 5 Objects 40*40 11*11 21*21 “where”-motor (r, c) 5 centers Size fixed: 20*20 global Retina-based supervision

39 Michigan State University39 Advantage l Internal iterations are not necessary l Network is running much faster l Easier to track neural representations and evaluate performance l Performance evaluation l What motor reaches 100% accuracy for disjoint test l Where motor reaches 41.09% accuracy for disjoint test

40 Michigan State University40 Problems Top-down projection from motor + Bottom-up responses Top-down responses Total responses Dominance by Top-down Projection

41 Michigan State University41 Solution l Sparse bottom-up responses by only keeping local top-k winner of bottom-up responses l The performance of where motor increases from around 40% to 91%.

42 Michigan State University42 Fully Implemented WWN (Latest) “where”-motor Image (40*40) V1 (40*35) V2 (40*40) V4 (40*40) “what”-motor: 5 objects (smoothing by Gaussian) 11*11 21*21 MT (r, c) 3*3 center Fixed-size: 20*20 (smoothing by Gaussian) (40*40) Each cortex: Modified ADAST

43 Michigan State University43 Modified ADAST Previous Cortex L4 L2/3 L6 (ranking) L5 (ranking) L2/3 Next Cortex

44 Michigan State University44 Other improvements l Smooth the external motors using Gaussian function l Where motors are evaluated by regression errors l Local top-k is adaptive by neuron positions l The network does not converge by internal iterations l learning rate for top-down excitation is adaptive by internal iterations. l Using context information

45 Michigan State University45 Layer 1 – Bottom-up Weights

46 Michigan State University46 Layer 2 – Response-weighted Stimuli

47 Michigan State University47 Layer 3 (Where) – Top-down Weights

48 Michigan State University48 Layer 3 (What) – Top-down Weights

49 Michigan State University49 Test Samples Input “Where” motor (ground truth) “What” motor (ground truth) “Where” output (Saliency-based) “Where” output (“What” supervised) “What” output (Saliency-based) “What” output (“Where” supervised)

50 Michigan State University50 Performance Evaluation Without supervision Supervise “Where” Supervise “What” “Where” motor (regression error: MSE) 4.137 pixelsN/A4.137 pixels “What” motor (classification error: %) 12.7%12.1 %N/A Average error for “where” and “what” motors (250 test samples)

51 Michigan State University51 Discussions


Download ppt "Michigan State University1 Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways Zhengping Ji Embodied Intelligence."

Similar presentations


Ads by Google