Computational Video Group From recognition in brain to recognition in perceptual vision systems. Case study: face in video. Example: identifying computer.

Computational Video Group From recognition in brain to recognition in perceptual vision systems. Case study: face in video. Example: identifying computer users with low-resolution webcams. Dmitry Gorodnichy and Gilles Bessens http://iit-iti.nrc-cnrc.gc.ca http://synapse.vit.iit.nrc.ca (www.perceptual-vision.com) CVR Conference on Computational Vision in Neural and Machine Systems, York University, Toronto, Ontario, Canada. June 15 - 18, 2005

What we want? ? Why to bother? Face recognition systems performance (from NATO Biometrics workshop, Ottawa, Oct. 2004) –Lots of $$$ already spent on face recognition to video data… – Still computers fail… – And still Face Recognition Grand Challenge (www.frvt.org) is seen: “in making the video data of better quality”… instead of … developing approaches which can deal with low-quality data

Wrong approach - wrong results Image-based biometrics modalities Photographic facial data and video-based facial data are two different modalities:  different nature of data  different biometrics  different approaches  different testing benchmarks  In video: faces are meant to be of low quality and resolution.  Humans recognize faces on TV of 12 pixel between the eyes all the time ICAO-conformed passport photograph (presently used for forensic identification) Images from surveillance cameras (of 11/9 hijackers) and TV. NB: VCD is 320x240 pixels

Computers which can see: Users with accessibility needs (e.g. residents of the SCO Health Center in Ottawa) will benefit the most from those. But other users would benefit too. Seeing tasks: 1. Where - to see where is a user: {x,y,z…} 2. What - to see what is user doing: {actions} 3. Who - to see who is the user: {names} Our goal: to built Perceptual Vision Systems which can do all three tasks. Another application: Seeing computers x y, z  PVS monitor binary event ON OFF recognition / memorization Unknown User!

Copyright S. A. LA NACION 2003. Todos los derechos reservados. Rating by Planeta Digital (Aug. 2003) Precision & convenience of tracking the convex-shape nose feature allows one to use nose as mouse (or joystick handle) Motion,colour,edges,Haar-wavelets  nose search box: x,y,width,height Convex-shape template matching  nose tip detection: I,J (pixel precision) Integration over continuous intensity  X,Y (sub-pixel pixel precision) image (X,Y) Where we are now?

To understand how human brain does it 12 pixels between the eyes to be sufficient ! Main three features of human vision recognition system: 1) Efficient visual attention mechanisms 2) Accumulation of data in time 3) Efficient neuro-associative mechanisms Main three neuro-associative principles: Main three neuro-associative principles: 1.Non-linear processing 2.Massively distributed collective decision making 3.Synaptic plasticity a) to accumulate learning data in time by adjusting synapse, b) to associate a visual stimulus to a semantic meaning based on the computed synaptic values Keys to resolving recognition problem

Lessons from biological vision Saliency based localization and rectification - implemented Recognition decision at time t depends on our recognition decision at time t+1 - implemented Local brightness adjustment - implemented Accumulation over time and space - implemented

Lessons from biological memory Brain stores information using synapses connecting the neurons. In brain: 10 10 to 10 13 interconnected neurons Neurons are either in rest or activated, depending on values of other neurons Yj and the strength of synaptic connections: Yi={+1,-1} Brain is a network of “binary” neurons evolving in time from initial state (e.g. stimulus coming from retina) until it reaches a stable state – attractor. Attractors are our memories! Refs: Hebb’49, Little’74,’78, Willshaw’71

From visual image  to saying name From neuro-biological prospective, memorization and recognition are two stages of the associative process : From receptor stimulus R  to effector stimulus E “Dmitry” In brain In computer Main associative principle Stimulus neuron Response neuron Xi: {+1 or –1} Yj: {+1 or –1} Synaptic strength: -1 < Cij < +1 Main question of learning: How to update synaptic weights Cij as f(X,Y) ?

Learning process Learning rules: From biologically plausible to mathematically justifiable Hebb (correlation learning): is of form Better however is of form: Should be of form: Widrow-Hoff’s (delta) rule: We use Projection Learning rule: V =CV It is most preferable, as it is: - both incremental and takes into account relevance of training stimuli and attributes; - guaranteed to converge (obtained from stability condition V m =CV m ); - fast in both memorization and recognition; also called pseudo-inverse rule: C=VV + Refs: Amari’71,’77, Kohonen’72, Personnaz’85, Kanter-Sompolinsky’86,Gorodnichy‘95-’99

Steps of video-based recognition 1. Face-looking regions are detected using rapid classifiers. 2. They are verified to have skin colour and not to be static. 3. Face rotation is detected and rotated, eye aligned and resampled to 12-pixels-between-the-eyes resolution face is extracted. 4. Extracted face is converted to a binary feature vector (Receptor): Yr 5. This vector is then appended by nametag vector (Effector): V= Y(0)=(Yr,Ye) 6. In memorization: synapses of the network are updated: dCij(V)  V In recognition: memory recall as attractor is achieved: Y(t*)  Y(0) 12 24 2.. IOD

Recognition process Each frame initializes the system to state Y(0) = (01000011…, 0000) from which associative recall is achieved as a result of convergence to an attractor Y(t*)= Y(t*+1) = (01000001…, 0010) – as in brain… Effector component of attractor (0010) is analyzed. Possible outcomes: S00 ( none of nametag neurons fire), S10 ( one fires) and S11 (several fire) Final decision is made over several frames: (e.g. this is ID=5 in all these cases) 0000100000 0000010000 00001000000000100000 00000000000000100000 0000100000 00101000000000100000

Tested! - Using TV programs annotation- Using IIT-NRC 160x120 facial video database (one video to memorize, another to recognize)

Perceptual Vision Interface Nouse™ Evolved from a single demo program to a hands-free perceptual vision system which can recognize users. Uses a 160x120 low-fi webcam to constantly monitor the user’s identity Runs in background (a user may not even know he is being watched) Integrated with facial tracking Provides means for complete hands-free interaction

From our website: Try friv.exe yourself - Works with your web-cam or.avi file - Shows “brain model” synapses as watch (in memorization mode) - Shows nametag neurons states as your watch a facial video (in recognition mode)

References Gorodnichy, D. Video-based framework for face recognition in video. Second Workshop on Face Processing in Video (FPiV'05) in Proceedings of Second Canadian Conference on Computer and Robot Vision (CRV'05), pp. 330-338. Victoria, BC, Canada. 9-11 May, 2005. NRC 48216. Gorodnichy, D. Associative neural networks as means for low-resolution video-based recognition. International Joint Conference on Neural Networks (IJCNN'05). Montreal, Quebec, Canada. July 31-August 4, 2005. NRC 48217. Gorodnichy, D. Projection Learning vs. Correlation Learning: From Pavlov Dogs to Face Recognition. In Correlation Learnings AI'05 Workshop. May 8, 2005. Victoria, B.C.. NRC 48209. Bessens, G., Gorodnichy, D. Towards Building User Seeing Computers. Second Canadian Conference on Computer and Robot Vision Workshop on Face Processing in Video (FPiV'05). May 9-11, 2005. Victoria, B.C. NRC 48210. Gorodnichy, D. Recognizing Faces in Video Requires Approaches Different from Those Developed for Face Recognition in Photographs, NATO IST - 044 Workshop on "Enhancing Information Systems Security through Biometrics". Ottawa, Ontario, Canada. October 18-20, 2004. NRC 47149. Dmitry O. Gorodnichy and Gerhard Roth. Nouse 'Use your nose as a mouse' perceptual vision technology for hands-free games and interfaces. Image and Vision Computing, Volume 22, Issue 12, 1 October 2004, Pages 931-942, 2004. NRC 47140. D.O. Gorodnichy, A.M. Reznik. Increasing Attraction of Pseudo-Inverse Autoassociative Networks, Neural Processing Letters, volume 5, issue 2, pp. 123-127, 1997.

Computational Video Group From recognition in brain to recognition in perceptual vision systems. Case study: face in video. Example: identifying computer.

Similar presentations

Presentation on theme: "Computational Video Group From recognition in brain to recognition in perceptual vision systems. Case study: face in video. Example: identifying computer."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computational Video Group From recognition in brain to recognition in perceptual vision systems. Case study: face in video. Example: identifying computer.

Similar presentations

Presentation on theme: "Computational Video Group From recognition in brain to recognition in perceptual vision systems. Case study: face in video. Example: identifying computer."— Presentation transcript:

Similar presentations

About project

Feedback