Presentation on theme: "Visual Saliency: the signal from V1 to capture attention Li Zhaoping Head, Laboratory of natural intelligence Department of Psychology University College."— Presentation transcript:
Visual Saliency: the signal from V1 to capture attention Li Zhaoping Head, Laboratory of natural intelligence Department of Psychology University College London December 2002
Visual tasks: object recognition and localization Pre-condition: object segmentation. Question: how can one segment object before recognition? (how can there be egg before chicken?) I recently proposed (Li 1998, 1999, 2000, in particular Li TICS, Jan. 2002) Pre-attentive segmentation by highlighting conspicuous image areas where homogeneity breaks down (these areas are candidate locations for object boundaries, thus serving segmentation). V1 produces a saliency map containing such highlights, by intracortical interactions.
Contrast input to V1 Saliency output from V1 model V1 produces a saliency map V1 model Intra-cortical Interactions Highlighting important image locations. The V1 model is based on V1 physiology and anatomy (e.g., horizontal connections linking cells tuned to similar orientations), tested to be consistent with physiological data on contextual influences (e.g., iso-orientation suppression, Knierim and van Essen (1992) co-linear facilitation, Kapadia et al 1995).
Original inputV1 response S S=0.2, S=0.4, S=0.12, S=0.22 Z = (S-S)/σ --- z score, measuring saliencies of items Z=1.0 Z=7 Z=-1.3 Z=1.7 Histogram of all responses S regardless of features s Pop-out Saliency of an item is assumed to increase with its evoked V1 response. We assume that efficiency of a visual search task increases with the salience of the target (or its most salient part, e.g., the horizontal bar in the target cross above). The high z score, z = 7, (of the horizontal bar), a measure of the cross’ salience, enables the cross to pop out, since its evoked V1 response (to the horizontal bar) is much higher than the average population response of the whole image. The cross has a unique feature, the horizontal bar, which evokes the highest response since it experiences no iso- orientation suppression while all distractors do. Hence, intra-cortical interaction is a neural basis for why feature searches are often efficient.
V1’s output as saliency map is viewed under the idealization of the top-down feedback to V1 being disabled, e.g., shortly after visual exposure or under anesthesia. Signal saliency regardless of object features: contrary to common belief, cells signaling saliency can also be tuned to features. V1 can produce a saliency map even though its cells are tuned to features. V1 neurons’ firing rate is the currency for saliency, just like a US dollar is a functional currency even if the holder has his/her own particular nationality Chinese, US, or other.
The V1 saliency map agrees with visual search behavior. Z= Target=Conjunction search --- serial search inputV1 output Target = + Z=7 Feature search --- pop out
The saliency map also explains spatial and distractor effects. Z=0.25 Target= Distractors dissimilar to each other Z=3.4 Target= Homogeneous background, identical distractors regularly placed Z=0.22 Target= Distractors irregularly placed InputsV1 outputs The easiest search Same target, different backgrounds
The V1 saliency map also explains: Visual search asymmetry: e.g., searching for longer line among shorter ones is easier than the reverse, searching for circle with a gap among closed circles easier than reverse, etc. Why some conjunction searches are easier than others, e.g., searching for a motion-orientation conjunction is easier than a color-orientation conjunction Etc. The V1 saliency map has made testable predictions and confirmed by subsequent tests E.g., Snowden (1998) found texture segmentation by orientation of texture bars impaired by random color variations of the bars, prediction of reduced impairment by using thinner bars was tested and confirmed (Zhaoping & Snowden 2002).
Potential interactions with other team members of the collaboration (self-centered): Pre-attentive segmentation, V1 saliency map, Li Zhaoping Feature binding, Chen Lin Visual attention, He Sheng Perceptual learning, Lu Zhonglin Mathematical modeling Zhang Jun Artificial vision, Zhang Jiajie Visual physiology, Cheng Kang learning vs. attention engineering Feature coding dynamics and representation Neural mechanisms Top-down and bottom up interactions