Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using associative memory principles to enhance perceptual ability of vision systems (Giving a meaning to what you see) CVPR Workshop on Face Processing.

Similar presentations


Presentation on theme: "Using associative memory principles to enhance perceptual ability of vision systems (Giving a meaning to what you see) CVPR Workshop on Face Processing."— Presentation transcript:

1 Using associative memory principles to enhance perceptual ability of vision systems (Giving a meaning to what you see) CVPR Workshop on Face Processing in Video June 28, 2004, Washington, DC, USA Dr. Dmitry Gorodnichy Computational Video Group Institute for Information Technology National Research Council Canada www.cv.iit.nrc.ca/~dmitry/pinn

2 Designing visual memory using attractor-based neural networks with application to perceptual vision systems CVPR Workshop on Face Processing in Video June 28, 2004, Washington, DC, USA Dr. Dmitry Gorodnichy Computational Video Group Institute for Information Technology National Research Council Canada www.cv.iit.nrc.ca/~dmitry/pinn www.perceptual-vision.com/memory

3 3. Associative memory for video (Dr. Dmitry Gorodnichy) The unique place of this research You are here Computer vision Pattern recognition Your eyes, your brain

4 4. Associative memory for video (Dr. Dmitry Gorodnichy) Talk overview 1. On neurobiology side - How it works in brain: From eye retina, to primary visual cortex, to neurons, to synapses 2. Memories as attractors of associative neural network - Finding Best learning rule to tune the synapses 3. On computer vision path - Evolution of Perceptual Vision User Interface Systems: From Face Detection to Face Tracking to Face Localization to Face Recognition 4. Putting it all together: Visual memory for analyzing faces: - What makes processing in video special - Canonical face representation - Memories of faces as attractors of the network

5 5. Associative memory for video (Dr. Dmitry Gorodnichy) How we (humans) do it: see & memorize what we see ? Dorsal (“where”) stream: V1,V2,V3… deal with object localization Ventral (“what”) stream: V1, V2, V4, inferior temporal cortex (TE/IT) deals with object recognition Refs: Perus, Ungerleider, Haxby, Riesenhuber, Poggio … Seeing

6 6. Associative memory for video (Dr. Dmitry Gorodnichy) How we (humans) do it: see & memorize what we see ? (cntd) In brain: 10 10 to 10 13 interconnected neurons Neurons are either in rest or activated (modelled as units taking values) Yi={+1,-1}, depending on value of other neurons Yj and the strength of synaptic connections Cij Brain is thus modelled as a network of binary neurons evolving in time from an initial state (e.g. stimulus coming from retina) until it reaches a stable state - attractor The attractors of the network is what we actually remember – associative memory. Recognizing / Memorizing Refs: Hebb’49, Little’74,’78, Willshaw’71, …

7 7. Associative memory for video (Dr. Dmitry Gorodnichy) Recognition / memorization: formally V ~CV Main question: How to compute Cij so that a) the desired patterns V m become attractors, i.e. V m ~CV m and largest attraction radius (tolerated noise) - largest number of prototypes M stored b) network exhibits best associative (error- correction) properties, i.e. - largest attraction radius (tolerated noise) - largest number of prototypes M stored? Refs: Hebb’49, McCalloch-Pitts‘43, Amari’71,’77, Hopfield’82,Sejnowski’89, Willshaw’71 Attractor-based neural networks

8 8. Associative memory for video (Dr. Dmitry Gorodnichy) Learning rules: From biologically plausible to mathematically justifiable Neurophysiological Postulate: “If two neurons on either side of a synapse are activated, then the strength of the synapse is strengthened” “When a child is born, she knows nothing. As she repeatedly observed, she learns” – Postulate from Montessory approach to enfant development. Models Hebb: ( C = 1/N VV T ), Generalized Hebb: Better however: or even Refs: Hebb’49, Hopfield’82, Sejnowski’77, Willshaw’71 How to update weights

9 9. Associative memory for video (Dr. Dmitry Gorodnichy) C = VV + V =CV Obtained mathematically from stability condition: V m =CV m With reduced self-connection ( Cii = 0.15 Cii ), it is guaranteed [Gorodnichy’97] to retrieve M=0.5N patterns from 8% noise M=0.7N patterns from 2% noise (for comparison: Hebb rule stops retrieving when M=0.14N) Widrow-Hoff’s (delta) rule is the iterative approximation of it. Hebb rule is the special case of it for orthogonal prototypes. Refs: Amari’71,’77, Kohonen’72, Personnaz’85, Kanter-Sompolinsky’86,Gorodnichy‘95-’99 Pseudo-inverse as the best learning rule

10 10. Associative memory for video (Dr. Dmitry Gorodnichy) … besides that it yields the best retrieval for this type of networks. It is non-iterative – good for fast (real-time) learning It is also fast in retrieval. The performance of the network can be examined and improved analytically.  Guaranteed to converge. It can deal with continuous stream of data, never being saturated: if dynamic desuturation is used -> maintaining the capacity of 0.2N (with complete retrieval) -> providing means for forgetting obsolete data -> setting the basis for designing of adaptive filters All this makes the network very suitable for real-time memorization and recognition, as needed for video processing tasks. Finally, there's a free CPP code which you can compile and try yourself!free CPP code At PINN website: www.cv.iit.nrc.ca/~dmitry/pinn - What’s else good about PI rule

11 11. Associative memory for video (Dr. Dmitry Gorodnichy) These Neural Network are known as… pseudo-inverse networks - for using Moore-Penrose pseudoinverse V + in computing the synapces projection networks - for synaptic (weight) matrix C=VV + being the projection matrix on the space of prototypes Hopfield-like networks - for being binary and fully-connected in the stage of learning recurrent networks - for evolving in time, based on external input and internal memory attractor-based networks - for storing patterns as attractors (i.e. stable states of the network) dynamic systems - for allowing the dynamic systems theory to be applied associative memory - for being able to memorize, recall and forget patterns, just as much as humans do.

12 12. Associative memory for video (Dr. Dmitry Gorodnichy) Analytical examination By looking at the synaptic weights Cij, one can say a lot … about the properties of memory: - how many main attractors (stored memories) it has. - how good the retrieval is.

13 13. Associative memory for video (Dr. Dmitry Gorodnichy) Attraction Radius as function of weights Theoretical result: (for direct attraction radius)

14 14. Associative memory for video (Dr. Dmitry Gorodnichy) Dynamics of the network The behaviour of the network is governed by the energy functions However : They are few -> They are few, when D>0.1 [Gorodnichy&Reznik’97] They are detected automatically -> They are detected automatically  The network always converges: as long as Cij=Cji Cycles are possible, when D<1 :

15 15. Associative memory for video (Dr. Dmitry Gorodnichy) Update flow neuro-processing ->is very fast -> is very fast (as only few neurons are actually changing in one iteration) -> detects cycle automatically -> detects cycle automatically -> suitable for parallel implementation -> suitable for parallel implementation [Gorodnichy&Reznik’94]: “Process only those neurons which change during the evolution”, i.e. instead of N multiplications: do only few of them :

16 16. Associative memory for video (Dr. Dmitry Gorodnichy) How video information is processed ? As we know how to memorize, the question is what should be memorized? What type of video information needs to be processed ? Lets see what mother nature (neurobiology) tells us

17 17. Associative memory for video (Dr. Dmitry Gorodnichy) Visual Processing mechanisms Images are of very low resolution except in the fixation point. The eyes look at points which attract visual attention. disparity Saliency is: in a) motion, b) colour, c) disparity, d) intensity. These channels are processed independently in brain Intensity means: frequencies, orientation, gradient. Brain process the sequences of images rather than one image. - Bad quality of images is compensated by the abundance of images. Colour & motion are used for segmentation. Intensity is used for recognition. Bottom-up (image driven) visual attention is very fast and precedes top-down (goal-driven) attention: 25ms vs 1sec. Refs:Itti,….

18 18. Associative memory for video (Dr. Dmitry Gorodnichy) Visual recognition mechanism - What to learn: generality vs specifics, invariance vs selectivity - Affine transformations in 2D (rotation in image plane, scale) are easily dealt with. - No 3D model stored. Instead, several view-based 2D models stored - One neural network per view. In context of face recognition: - Faces are stored in canonical representation - 2D transformations are easy in image/video processing! - Video allows to wait (until a face is a position in which it was stored) Refs: Poggio,…

19 19. Associative memory for video (Dr. Dmitry Gorodnichy) Orientation selectivity, Top-down vs bottom up detection From [Riesenhuber-Poggio, Nature Neuroscience,2000]

20 20. Associative memory for video (Dr. Dmitry Gorodnichy) On computer vision side

21 21. Associative memory for video (Dr. Dmitry Gorodnichy) Perceptual Vision System Goal: To detect, track and recognize face and facial movements of the user. x y, z  PUI monitor binary event ON OFF recognition / memorization Unknown User! Setup: + face close to camera (within hand distance) + approximately front-faced oriented + limited number of users and motions - off-the shelf camera (low quality, low resolution) - Desktop computer (with limited processing power)

22 22. Associative memory for video (Dr. Dmitry Gorodnichy) What can be “perceived”: Face processing tasks “Something yellow moves” Face Segmentation Facial Event Recognition Face Memorization Face Detection Face Tracking (crude) Face Classification Face Localization (precise) Face Identification “It’s a face” “It’s at (x,y,z,  ” “Lets follow it!” “It’s face of a child”“S/he smiles, blinks” “Face unknown. Store it!” “It’s Mila!” “I look and see…” …

23 23. Associative memory for video (Dr. Dmitry Gorodnichy) Computer Vision results achieved 1998. Proof-of-concept PUI: colour-based tracking [Bradski] –Unlikely to be used for precise tracking… 1999-2002.Several good skin colour models developed (HSV,UCS,YCrCb*) –Unlikely to get better than that… 2002. Subpixel-accuracy convex-shape nose tracking [Nouse] 1999-2001. Motion-based segmentation & localization –2001. Non-linear change detection –2003. Second-order change detection [Double-blink] 2001. Viola-Jones face detection using Haar-like wavelets 2004. Stereotracking using nose and projective vision [Gorodnichy,Roth-IVC]

24 24. Associative memory for video (Dr. Dmitry Gorodnichy) Face Detection and Tracking

25 25. Associative memory for video (Dr. Dmitry Gorodnichy) Face Detection and Tracking (lights off)

26 26. Associative memory for video (Dr. Dmitry Gorodnichy) Face Detection and Tracking (lights on)

27 27. Associative memory for video (Dr. Dmitry Gorodnichy) Demand and applications Internet, tendencias & tecnología La nariz utilizada como mouse En el Instituto de Tecnología de la Información, en Canadá, se desarrolló un sistema llamado Nouse que permite manejar softwares con movimientos del rostro. El creador de este programa, Dmitry Gorodnichy, explicó vía e-mail a LA NACION LINE cómo funciona y cuáles son sus utilidades Si desea acceder a más información, contenidos relacionados, material audiovisual y opiniones de nuestros lectores ingrese en : http://www.lanacion.com.ar/03/05/21/dg_497588.asp http://www.lanacion.com.ar/03/05/21/dg_497588.asp Copyright S. A. LA NACION 2003. Todos los derechos reservados.

28 28. Associative memory for video (Dr. Dmitry Gorodnichy) On importance of nose Test: The user rotates his head only! (the shoulders do not move) Precision / convenience is such that it allows one to use nose as mouse (or a joystick handle) – to Nouse

29 29. Associative memory for video (Dr. Dmitry Gorodnichy) Nouse TM : range and speed of tracking

30 30. Associative memory for video (Dr. Dmitry Gorodnichy) Stereotracking with nose feature

31 31. Associative memory for video (Dr. Dmitry Gorodnichy) Second-order change detection Detecting change in a change [Gorodnichy’03] Non-linear change detection deals with changes due illumination changes [ Durucan’02]

32 32. Associative memory for video (Dr. Dmitry Gorodnichy) Eye Blink Detection Previously very difficult in moving heads With second-order change detection became possible Is currently used to enable people with brain injury face-to-face communication [AAATE’03]

33 33. Associative memory for video (Dr. Dmitry Gorodnichy) Something (special) about video Importance: - Video is becoming ubiquitous. Cameras are everywhere. - For security, computer–human interaction, video-conferencing, entertainment … Constraints: - Real-time processing is required. - Low resolution: 160x120 images or mpeg-decoded. - Low-quality: week exposure, blurriness, cheap lenses Essence: - It is inherently dynamic!  temporal info to make up for bad quality - It has parallels with biological vision!  it can be processed efficiently

34 34. Associative memory for video (Dr. Dmitry Gorodnichy) Applicability of 160x120 video According to face anthropometrics (studied on BioID database) Tested with Perceptual User interfaces Face size ½ image¼ image 1/8 image 1/16 image In pixels80x8040x4020x2010x10 Between eyes-IOD 4020105 Eye size 201052 Nose size 105-- FS  b FD  b- FT  b- FL  b-- FER  b- FC  b- FM / FI  --  – good b – barely applicable - – not good

35 35. Associative memory for video (Dr. Dmitry Gorodnichy) Choosing the face model. On importance of eyes: Eyes are the most salient features on a face. Besides, there two of them, which makes the excellent reference frame out of them They also the best (and the only) stable landmarks on a face which can be used a reference.  Intra-ocular distance (IOD) makes a very convenient unit of measurement!  Eye –centered face model On resolution: Lowest resolution possible, not to inflict overfitting due to the present noise (and there’s a lot of noise in video!)

36 36. Associative memory for video (Dr. Dmitry Gorodnichy) Eye-centered face representations Suitable for Face Analysis from video d 24 2.. IOD Suitable for Face Recognition in travel documents [ICAO’02] Size 24 x 24 is sufficient for face memorization & recognition and is optimal for low-quality video and for fast processing.

37 37. Associative memory for video (Dr. Dmitry Gorodnichy) From image pixels to feature vectors When the eyes are detected, and a face is converted to a canonical representation, it is easy to memorize to recognize Using (orientational, frequency) features: Gabor filters ? As faces are already rectified (to the same scale and orientation), no need for complex transformations. Just deal with illumination changes. Converting 24x24 face to binary feature vector: A) V i =I xy - I ave,  N=24x24=576 B ) V i,j =sign(I i - I j ),  N= 24 4 C ) V i,j =Haar-like(i,j,k,l )  much more Some pixels may be ignored (corners, eye location)

38 38. Associative memory for video (Dr. Dmitry Gorodnichy) Closer to experiments Network size of N=576 stores –M=N/2 states with … –M=N/4 states with 25%N error correction Faces are extracted using OpenCV Viola-Jones function Another way: from blinking as in [avbpa03]:

39 39. Associative memory for video (Dr. Dmitry Gorodnichy) Visual memory for user perception What can be retrieved: –user identity, –face orientation, –facial expression

40 40. Associative memory for video (Dr. Dmitry Gorodnichy) Retrieving orientation

41 41. Associative memory for video (Dr. Dmitry Gorodnichy) A few more demos: taped and live… … as time allows… Watch how memory is being filled out, as you learn new prototypes

42 42. Associative memory for video (Dr. Dmitry Gorodnichy) Conclusions ? Computer vision Pattern recognition Neuro-biology

43 43. Associative memory for video (Dr. Dmitry Gorodnichy) Conclusions A lot has been done in PR, CV, NB. –How to know all of these ?… –How to use all of these ?… Or which way you’d prefer? Attractor-based network - great tool: –very easy to understand what it is doing –very suitable for live real-time video processing –Very much within the lines of biological vision –You are invited to try it yourself! – from our website Other contributions: –Canonical face representation for FPIV Is that possible to work, while on parental leave with two kids?

44 44. Associative memory for video (Dr. Dmitry Gorodnichy) Dealing with a stream of data Dynamic desaturation Dynamic desaturation : -> maintains the capacity of 0.2N (with complete retrieval) -> allows to store data in real-time -> allows to store data in real-time (no need for iterative learning methods!) -> provides means for forgetting obsolete data -> is the basis for the design of adaptive filters


Download ppt "Using associative memory principles to enhance perceptual ability of vision systems (Giving a meaning to what you see) CVPR Workshop on Face Processing."

Similar presentations


Ads by Google